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Deoxyribonucleic acid (DNA) is a nucleic acid that 
contains the genetic instructions used in the 
development and functioning of all known living 
organisms and some viruses. The main role of DNA 
molecules is the long-term storage of information. 
DNA is often compared to a set of blueprints or a 
recipe, or a code, since it contains the instructions 
needed to construct other components of cells, such 
as proteins and RNA molecules. The DNA segments 
that carry this genetic information are called genes, 
but other DNA sequences have structural purposes, 
or are involved in regulating the use of this genetic 
information. 

Chemically, DNA consists of two long polymers of 

simple units called nucleotides, with backbones made 

of sugars and phosphate groups joined by ester 

bonds. These two strands run in opposite directions 

to each other and are therefore anti-parallel. 

Attached to each sugar is one of four types of 

molecules called bases. It is the sequence of these 

four bases along the backbone that encodes 

information. This information is read using the 

genetic code, which specifies the sequence of the 

amino acids within proteins. The code is read by copying stretches of DNA into the related 

nucleic acid RNA, in a process called transcription. 

Within cells, DNA is organized into structures called chromosomes. These chromosomes are 
duplicated before cells divide, in a process called DNA replication. Eukaryotic organisms 
(animals, plants, fungi, and protists) store most of their DNA inside the cell nucleus and 
some of their DNA in the mitochondria. Prokaryotes (bacteria and archaea) however, store 
their DNA in the cell's cytoplasm. Within the chromosomes, chromatin proteins such as 
histones compact and organize DNA. These compact structures guide the interactions 
between DNA and other proteins, helping control which parts of the DNA are transcribed. 




The structure of part of a DNA double 
helix 
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Properties 

DNA is a long polymer made 
from repeating units called 
nucleotides. [1] [2] [3] The DNA 
chain is 22 to 26 Angstroms 
wide (2.2 to 2.6 nanometres), 
and one nucleotide unit is 3.3 A 
(0.33 nm) long. Although each 
individual repeating unit is very 
small, DNA polymers can be 
very large molecules containing 
millions of nucleotides. For 
instance, the largest human 
chromosome, chromosome 

number 1, is approximately 220 
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The chemical structure of DNA. Hydrogen bonds are shown as 
dotted lines. 



In living organisms, DNA does 

not usually exist as a single 

molecule, but instead as a pair 

of molecules that are held 

tightly together. These two 

long strands entwine like vines, 

in the shape of a double helix. 

The nucleotide repeats contain 

both the segment of the 

backbone of the molecule, 

which holds the chain together, and a base, which interacts with the other DNA strand in 

the helix. In general, a base linked to a sugar is called a nucleoside and a base linked to a 

sugar and one or more phosphate groups is called a nucleotide. If multiple nucleotides are 

linked together, as in DNA, this polymer is called a polynucleotide. 

The backbone of the DNA strand is made from alternating phosphate and sugar residues. 
The sugar in DNA is 2-deoxyribose, which is a pentose (five-carbon) sugar. The sugars are 
joined together by phosphate groups that form phosphodiester bonds between the third and 
fifth carbon atoms of adjacent sugar rings. These asymmetric bonds mean a strand of DNA 
has a direction. In a double helix the direction of the nucleotides in one strand is opposite 
to their direction in the other strand. This arrangement of DNA strands is called 
antiparallel. The asymmetric ends of DNA strands are referred to as the 5' (five prime) and 
3' (three prime) ends, with the 5' end being that with a terminal phosphate group and the 3' 
end that with a terminal hydroxyl group. One of the major differences between DNA and 
RNA is the sugar, with 2-deoxyribose being replaced by the alternative pentose sugar 
ribose in RNA. [7] 

The DNA double helix is stabilized by hydrogen bonds between the bases attached to the 
two strands. The four bases found in DNA are adenine (abbreviated A), cytosine (C), 
guanine (G) and thymine (T). These four bases are attached to the sugar/phosphate to form 
the complete nucleotide, as shown for adenosine monophosphate. 
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These bases are classified into two types; adenine and guanine are fused five- and 
six-membered heterocyclic compounds called purines, while cytosine and thymine are 
six-membered rings called pyrimi dines. ] A fifth pyrimidine base, called uracil (U), usually 
takes the place of thymine in RNA and differs from thymine by lacking a methyl group on its 
ring. Uracil is not usually found in DNA, occurring only as a breakdown product of cytosine. 



Grooves 

Twin helical strands form the DNA backbone. Another 
double helix may be found by tracing the spaces, or 
grooves, between the strands. These voids are adjacent 
to the base pairs and may provide a binding site. As the 
strands are not directly opposite each other, the 
grooves are unequally sized. One groove, the major 
groove, is 22 A wide and the other, the minor groove, is 
12 A wide. The narrowness of the minor groove 
means that the edges of the bases are more accessible 
in the major groove. As a result, proteins like 
transcription factors that can bind to specific sequences 

in double-stranded DNA usually make contacts to the 

ri2i 
sides of the bases exposed in the major groove. This 

situation varies in unusual conformations of DNA within 

the cell (see below), but the major and minor grooves 

are always named to reflect the differences in size that 

would be seen if the DNA is twisted back into the 

ordinary B form. 

Base pairing 




Structure of a section of DNA. The 

bases lie horizontally between the two 

spiraling strands. Animated 



version at File:DNA orbit animated.gif 
- over 3 megabytes. 



Each type of base on one strand forms a bond with just 

one type of base on the other strand. This is called 

complementary base pairing. Here, purines form 

hydrogen bonds to pyrimidines, with A bonding only to T, and C bonding only to G. This 

arrangement of two nucleotides binding together across the double helix is called a base 

pair. As hydrogen bonds are not covalent, they can be broken and rejoined relatively easily. 

The two strands of DNA in a double helix can therefore be pulled apart like a zipper, either 

ri3i 
by a mechanical force or high temperature. As a result of this complementarity, all the 

information in the double-stranded sequence of a DNA helix is duplicated on each strand, 

which is vital in DNA replication. Indeed, this reversible and specific interaction between 

complementary base pairs is critical for all the functions of DNA in living organisms. 



[2] 
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Top, a GC base pair with three hydrogen bonds. Bottom, an AT base pair with two 
hydrogen bonds. Non-covalent hydrogen bonds between the pairs are shown as dashed 
lines. 

The two types of base pairs form different numbers of hydrogen bonds, AT forming two 
hydrogen bonds, and GC forming three hydrogen bonds (see figures, left). DNA with high 
GC-content is more stable than DNA with low GC-content, but contrary to popular belief, 
this is not due to the extra hydrogen bond of a GC basepair but rather the contribution of 
stacking interactions (hydrogen bonding merely provides specificity of the pairing, not 
stability). As a result, it is both the percentage of GC base pairs and the overall length of 
a DNA double helix that determine the strength of the association between the two strands 
of DNA. Long DNA helices with a high GC content have stronger-interacting strands, while 
short helices with high AT content have weaker-interacting strands. In biology, parts of 
the DNA double helix that need to separate easily, such as the TATAAT Pribnow box in 
some promoters, tend to have a high AT content, making the strands easier to pull apart. 
In the laboratory, the strength of this interaction can be measured by finding the 
temperature required to break the hydrogen bonds, their melting temperature (also called 
T value). When all the base pairs in a DNA double helix melt, the strands separate and 
exist in solution as two entirely independent molecules. These single-stranded DNA 
molecules have no single common shape, but some conformations are more stable than 
others. [17] 
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Sense and antisense 

A DNA sequence is called "sense" if its sequence is the same as that of a messenqer RNA 

ri8i 
copy that is translated into protein. 1 ' The sequence on the opposite strand is called the 

"antisense" sequence. Both sense and antisense sequences can exist on different parts of 

the same strand of DNA (i.e. both strands contain both sense and antisense sequences). In 

both prokaryotes and eukaryotes, antisense RNA sequences are produced, but the functions 

ri9i 
of these RNAs are not entirely clear. One proposal is that antisense RNAs are involved in 

requlatinq qene expression throuqh RNA-RNA base pairinq. 



[20] 



A few DNA sequences in prokaryotes and eukaryotes, and more in plasmids and viruses, 
blur the distinction between sense and antisense strands by havinq overlappinq qenes. 
In these cases, some DNA sequences do double duty, encodinq one protein when read alonq 
one strand, and a second protein when read in the opposite direction alonq the other 



strand. In bacteria, this overlap may be involved in the requlation of qene transcription 



[22] 



while in viruses, overlappinq qenes increase the amount of information that can be encoded 
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within the small viral qenome. 



Supercoiling 

DNA can be twisted like a rope in a process called DNA supercoilinq. With DNA in its 
"relaxed" state, a strand usually circles the axis of the double helix once every 10.4 base 
pairs, but if the DNA is twisted the strands become more tiqhtly or more loosely wound. 
If the DNA is twisted in the direction of the helix, this is positive supercoilinq, and the bases 
are held more tiqhtly toqether. If they are twisted in the opposite direction, this is neqative 
supercoilinq, and the bases come apart more easily. In nature, most DNA has sliqht 
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neqative supercoilinq that is introduced by enzymes called topoisomerases. These 
enzymes are also needed to relieve the twistinq stresses introduced into DNA strands 
durinq processes such as transcription and DNA replication. 



[26] 



Alternate DNA structures 

DNA exists in many possible 
conformations that include A-DNA, 
B-DNA, and Z-DNA forms, althouqh, 
only B-DNA and Z-DNA have been 
directly observed in functional 

rq] 

orqanisms. The conformation that 
DNA adopts depends on the 
hydration level, DNA sequence, the 
amount and direction of 

supercoilinq, chemical modifications 
of the bases, the type and 
concentration of metal ions, as well 
as the presence of polyamines in solution. 






From left to right, the structures of A, B and Z DNA 



The first published reports of A-DNA X-ray diffraction patterns— and also B-DNA used 
analyses based on Patterson transforms that provided only a limited amount of structural 

["28] [291 

information for oriented fibers of DNA. An alternate analysis was then proposed by 

Wilkins et ah, in 1953, for the in vivo B-DNA X-ray diffraction/scatterinq patterns of hiqhly 
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hydrated DNA fibers in terms of squares of Bessel functions. In the same journal, 
Watson and Crick presented their -» molecular modeling analysis of the DNA X-ray 
diffraction patterns to suggest that the structure was a double-helix. ] 

Although the B-DNA form' is most common under the conditions found in cells/ it is not 
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a well-defined conformation but a family of related DNA conformations that occur at the 
high hydration levels present in living cells. Their corresponding X-ray diffraction and 
scattering patterns are characteristic of molecular paracrystals with a significant degree of 
disorder. [33] [34] 

Compared to B-DNA, the A-DNA form is a wider right-handed spiral, with a shallow, wide 
minor groove and a narrower, deeper major groove. The A form occurs under 
non-physiological conditions in partially dehydrated samples of DNA, while in the cell it 
may be produced in hybrid pairings of DNA and RNA strands, as well as in enzyme-DNA 
complexes. Segments of DNA where the bases have been chemically modified by 

methylation may undergo a larger change in conformation and adopt the Z form. Here, the 
strands turn about the helical axis in a left-handed spiral, the opposite of the more common 
B form. These unusual structures can be recognized by specific Z-DNA binding proteins 
and may be involved in the regulation of transcription. ] 



Quadruplex structures 




Structure of a DNA quadruplex formed by telomere repeats. The 

looped conformation of the DNA backbone is very different from 

[39] 
the typical helical structure. 



At the ends of the linear 
chromosomes are specialized 
regions of DNA called telomeres. 
The main function of these regions 
is to allow the cell to replicate 
chromosome ends using the 
enzyme telomerase, as the 
enzymes that normally replicate 
DNA cannot copy the extreme 3' 
ends of chromosomes. These 

specialized chromosome caps also 
help protect the DNA ends, and 
stop the DNA repair systems in the 
cell from treating them as damage 
to be corrected. In human cells, 
telomeres are usually lengths of 
single-stranded DNA containing 
several thousand repeats of a 



simple TTAGGG sequence. 



[42] 



These guanine-rich sequences may 
stabilize chromosome ends by forming structures of stacked sets of four-base units, rather 
than the usual base pairs found in other DNA molecules. Here, four guanine bases form a 
flat plate and these flat four-base units then stack on top of each other, to form a stable 
G-quadruplex structure. These structures are stabilized by hydrogen bonding between 
the edges of the bases and chelation of a metal ion in the centre of each four-base unit. 
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Other structures can also be formed, with the central set of four bases coming from either a 
single strand folded around the bases, or several different parallel strands, each 
contributing one base to the central structure. 

In addition to these stacked structures, telomeres also form large loop structures called 
telomere loops, or T-loops. Here, the single-stranded DNA curls around in a long circle 
stabilized by telomere-binding proteins. At the very end of the T-loop, the 

single-stranded telomere DNA is held onto a region of double-stranded DNA by the 
telomere strand disrupting the double-helical DNA and base pairing to one of the two 
strands. This triple-stranded structure is called a displacement loop or D-loop. 

Branched DNA 
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A DNA structure with a single branching point. 




A DNA structure with multiple branches. 



In DNA fraying occurs when non-complementary regions exist at the end of an otherwise 
complementary double-strand of DNA. However, branched DNA can occur if a third strand 
of DNA is introduced and contains adjoining regions able to hybridize with the frayed 
regions of the pre-existing double-strand. Although the simplest example of branched DNA 
involves only three strands of DNA, complexes involving additional strands and multiple 



branches are also possible 



[46] 



Chemical modifications 
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Structure of cytosine with and without the 5-methyl group. After deamination the 
5-methylcytosine has the same structure as thymine 

Base modifications 

The expression of genes is influenced by how the DNA is packaged in chromosomes, in a 
structure called chromatin. Base modifications can be involved in packaging, with regions 
that have low or no gene expression usually containing high levels of methylation of 
cytosine bases. For example, cytosine methylation, produces 5-methylcytosine, which is 
important for X-chromosome inactivation. The average level of methylation varies 
between organisms - the worm Caenorhabditis elegans lacks cytosine methylation, while 
vertebrates have higher levels, with up to 1% of their DNA containing 5-methylcytosine. 
Despite the importance of 5-methylcytosine, it can deaminate to leave a thymine base, 
methylated cytosines are therefore particularly prone to mutations. Other base 

modifications include adenine methylation in bacteria, the presence of 
5-hydroxymethylcytosine in the brain, and the glycosylation of uracil to produce the 
"J-base" in kinetoplastids. 



Damage 

DNA can be damaged by many different 
sorts of mutagens, which change the DNA 
sequence. Mutagens include oxidizing 
agents, alkylating agents and also 
high-energy electromagnetic radiation such 
as ultraviolet light and X-rays. The type of 
DNA damage produced depends on the 
type of mutagen. For example, UV light can 
damage DNA by producing thymine dimers, 
which are cross-links between pyrimidine 
bases. On the other hand, oxidants such 
as free radicals or hydrogen peroxide 
produce multiple forms of damage, 
including base modifications, particularly of 
guanosine, and double-strand breaks. A 
typical human cell contains about 150,000 
bases that have suffered oxidative 
damage. Of these oxidative lesions, the 
most dangerous are double-strand breaks, 
as these are difficult to repair and can 
produce point mutations, insertions and 
deletions from the DNA sequence, as well 
as chromosomal translocations. 




A covalent adduct between benzo[a]pyrene, the major 

[53] 
mutagen in tobacco smoke, and DNA 



Many mutagens fit into the space between two adjacent base pairs, this is called 
intercalating . Most intercalators are aromatic and planar molecules, and include Ethidium 
bromide, daunomycin, and doxorubicin. In order for an intercalator to fit between base 
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pairs, the bases must separate, distorting the DNA strands by unwinding of the double 
helix. This inhibits both transcription and DNA replication, causing toxicity and mutations. 
As a result, DNA intercalators are often carcinogens, and Benzo[a]pyrene diol epoxide, 
acridines, aflatoxin and ethidium bromide are well-known examples. 
Nevertheless, due to their ability to inhibit DNA transcription and replication, other similar 
toxins are also used in chemotherapy to inhibit rapidly growing cancer cells. 

Biological functions 

DNA usually occurs as linear chromosomes in eukaryotes, and circular chromosomes in 
prokaryotes. The set of chromosomes in a cell makes up its genome; the human genome has 
approximately 3 billion base pairs of DNA arranged into 46 chromosomes. The 
information carried by DNA is held in the sequence of pieces of DNA called genes. 
Transmission of genetic information in genes is achieved via complementary base pairing. 
For example, in transcription, when a cell uses the information in a gene, the DNA 
sequence is copied into a complementary RNA sequence through the attraction between 
the DNA and the correct RNA nucleotides. Usually, this RNA copy is then used to make a 
matching protein sequence in a process called translation which depends on the same 
interaction between RNA nucleotides. Alternatively, a cell may simply copy its genetic 
information in a process called DNA replication. The details of these functions are covered 
in other articles; here we focus on the interactions between DNA and other molecules that 
mediate the function of the genome. 

Genes and genomes 

Genomic DNA is located in the cell nucleus of eukaryotes, as well as small amounts in 
mitochondria and chloroplasts. In prokaryotes, the DNA is held within an irregularly shaped 
body in the cytoplasm called the nucleoid. The genetic information in a genome is held 
within genes, and the complete set of this information in an organism is called its genotype. 
A gene is a unit of heredity and is a region of DNA that influences a particular 
characteristic in an organism. Genes contain an open reading frame that can be 
transcribed, as well as regulatory sequences such as promoters and enhancers, which 
control the transcription of the open reading frame. 

In many species, only a small fraction of the total sequence of the genome encodes protein. 
For example, only about 1.5% of the human genome consists of protein-coding exons, with 
over 50% of human DNA consisting of non-coding repetitive sequences. ' The reasons for 
the presence of so much non-coding DNA in eukaryotic genomes and the extraordinary 
differences in genome size, or C-value, among species represent a long-standing puzzle 
known as the "C-value enigma." However, DNA sequences that do not code protein may 
still encode functional non-coding RNA molecules, which are involved in the regulation of 

[66] 

gene expression. 
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T7 RNA polymerase (blue) producing a mRNA (green) from a 
DNA template (orange). 



and divergence. 



[70] 



Some non-coding DNA sequences 
play structural roles in 
chromosomes. Telomeres and 
centromeres typically contain few 
genes, but are important for the 
function and stability of 
chromosomes. An abundant 

form of non-coding DNA in humans 
are pseudogenes, which are copies 
of genes that have been disabled 
by mutation. These sequences 
are usually just molecular fossils, 
although they can occasionally 
serve as raw genetic material for 
the creation of new genes through 
the process of gene duplication 



Transcription and translation 

A gene is a sequence of DNA that contains genetic information and can influence the 
phenotype of an organism. Within a gene, the sequence of bases along a DNA strand 
defines a messenger RNA sequence, which then defines one or more protein sequences. 
The relationship between the nucleotide sequences of genes and the amino-acid sequences 
of proteins is determined by the rules of translation, known collectively as the genetic code. 
The genetic code consists of three-letter 'words' called codons formed from a sequence of 
three nucleotides (e.g. ACT, CAG, TTT). 

In transcription, the codons of a gene are copied into messenger RNA by RNA polymerase. 
This RNA copy is then decoded by a ribosome that reads the RNA sequence by base-pairing 
the messenger RNA to transfer RNA, which carries amino acids. Since there are 4 bases in 
3-letter combinations, there are 64 possible codons ( 4 3 combinations). These encode the 
twenty standard amino acids, giving most amino acids more than one possible codon. There 
are also three 'stop' or 'nonsense' codons signifying the end of the coding region; these are 
the TAA, TGA and TAG codons. 
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DNA ligase 
DNA Polymerase (Polo.) 




Leading 

strand 



"Topoisomerase 



Replication 

Cell division is essential for an 

organism to grow, but when a 

cell divides it must replicate 

the DNA in its genome so that 

the two daughter cells have 

the same genetic information 

as their parent. The 

double-stranded structure of 

DNA provides a simple 

mechanism for DNA 

replication. Here, the two 

strands are separated and 

then each strand's 

complementary DNA sequence 

is recreated by an enzyme called DNA polymerase. This enzyme makes the complementary 

strand by finding the correct base through complementary base pairing, and bonding it 

onto the original strand. As DNA polymerases can only extend a DNA strand in a 5' to 3' 

direction, different mechanisms are used to copy the antiparallel strands of the double 

helix. In this way, the base on the old strand dictates which base appears on the new 

strand, and the cell ends up with a perfect copy of its DNA. 



DNA replication. The double helix is unwound by a helicase and 

topoisomerase. Next, one DNA polymerase produces the leading 

strand copy. Another DNA polymerase binds to the lagging strand. 

This enzyme makes discontinuous segments (called Okazaki 

fragments) before DNA ligase joins them together. 



Interactions with proteins 

All the functions of DNA depend on interactions with proteins. These protein interactions 
can be non-specific, or the protein can bind specifically to a single DNA sequence. Enzymes 
can also bind to DNA and of these, the polymerases that copy the DNA base sequence in 
transcription and DNA replication are particularly important. 

DNA-binding proteins 
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Interaction of DNA with histones (shown in white, top). These proteins' basic amino acids 
(below left, blue) bind to the acidic phosphate groups on DNA (below right, red). 

Structural proteins that bind DNA are well-understood examples of non-specific 
DNA-protein interactions. Within chromosomes, DNA is held in complexes with structural 
proteins. These proteins organize the DNA into a compact structure called chromatin. In 
eukaryotes this structure involves DNA binding to a complex of small basic proteins called 
histones, while in prokaryotes multiple types of proteins are involved. The histones 

form a disk-shaped complex called a nucleosome, which contains two complete turns of 
double-stranded DNA wrapped around its surface. These non-specific interactions are 
formed through basic residues in the histones making ionic bonds to the acidic 
sugar-phosphate backbone of the DNA, and are therefore largely independent of the base 
sequence. Chemical modifications of these basic amino acid residues include 

methylation, phosphorylation and acetylation. These chemical changes alter the strength 
of the interaction between the DNA and the histones, making the DNA more or less 
accessible to transcription factors and changing the rate of transcription. Other 

non-specific DNA-binding proteins in chromatin include the high-mobility group proteins, 
which bind to bent or distorted DNA. These proteins are important in bending arrays of 
nucleosomes and arranging them into the larger structures that make up chromosomes. 

A distinct group of DNA-binding proteins are the DNA-binding proteins that specifically 
bind single-stranded DNA. In humans, replication protein A is the best-understood member 
of this family and is used in processes where the double helix is separated, including DNA 
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replication, recombination and DNA repair. These binding proteins seem to stabilize 
single-stranded DNA and protect it from forming stem-loops or being degraded by 
nucleases. 



In contrast, other proteins have evolved to bind to 
particular DNA sequences. The most intensively 
studied of these are the various transcription factors, 
which are proteins that regulate transcription. Each 
transcription factor binds to one particular set of DNA 
sequences and activates or inhibits the transcription of 
genes that have these sequences close to their 
promoters. The transcription factors do this in two 
ways. Firstly, they can bind the RNA polymerase 
responsible for transcription, either directly or through 
other mediator proteins; this locates the polymerase at 
the promoter and allows it to begin transcription. ' 
Alternatively, transcription factors can bind enzymes 
that modify the histones at the promoter; this will 
change the accessibility of the DNA template to the 
polymerase. ' 

As these DNA targets can occur throughout an 
organism's genome, changes in the activity of one type 
of transcription factor can affect thousands of 

T831 

genes. Consequently, these proteins are often the 

targets of the signal transduction processes that control responses to environmental 
changes or cellular differentiation and development. The specificity of these transcription 




The lambda repressor helix-turn-helix 
transcription factor bound to its DNA 
target 
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factors' interactions with DNA come from the proteins making multiple contacts to the 
edges of the DNA bases, allowing them to "read" the DNA sequence. Most of these 
base-interactions are made in the major groove, where the bases are most accessible. 




The restriction enzyme EcoRV (green) in a complex 
with its substrate DNA 



DNA-modifying enzymes 

Nucleases and ligases 

Nucleases are enzymes that cut DNA 
strands by catalyzing the hydrolysis of the 
phosphodiester bonds. Nucleases that 
hydrolyse nucleotides from the ends of 
DNA strands are called exonucleases, 
while endonucleases cut within strands. 
The most frequently used nucleases in 
molecular biology are the restriction 
endonucleases, which cut DNA at specific 
sequences. For instance, the EcoRV 
enzyme shown to the left recognizes the 
6-base sequence 5'-GAT|ATC-3' and makes a cut at the vertical line. In nature, these 
enzymes protect bacteria against phage infection by digesting the phage DNA when it 
enters the bacterial cell, acting as part of the restriction modification system. In 
technology, these sequence-specific nucleases are used in molecular cloning and DNA 
fingerprinting. 

T871 

Enzymes called DNA ligases can rejoin cut or broken DNA strands. Ligases are 
particularly important in lagging strand DNA replication, as they join together the short 
segments of DNA produced at the replication fork into a complete copy of the DNA 
template. They are also used in DNA repair and genetic recombination. ' 

Topoisomerases and helicases 

Topoisomerases are enzymes with both nuclease and ligase activity. These proteins change 
the amount of supercoiling in DNA. Some of these enzyme work by cutting the DNA helix 
and allowing one section to rotate, thereby reducing its level of supercoiling; the enzyme 

T251 

then seals the DNA break. Other types of these enzymes are capable of cutting one DNA 
helix and then passing a second strand of DNA through this break, before rejoining the 

T881 

helix. Topoisomerases are required for many processes involving DNA, such as DNA 
replication and transcription. 

Helicases are proteins that are a type of molecular motor. They use the chemical energy in 
nucleoside triphosphates, predominantly ATP, to break hydrogen bonds between bases and 
unwind the DNA double helix into single strands. These enzymes are essential for most 
processes where enzymes need to access the DNA bases. 
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Polymerases 

Polymerases are enzymes that synthesize polynucleotide chains from nucleoside 
triphosphates. The sequence of their products are copies of existing polynucleotide chains - 
which are called templates. These enzymes function by adding nucleotides onto the 3' 
hydroxyl group of the previous nucleotide in a DNA strand. Consequently, all polymerases 
work in a 5' to 3' direction. ' In the active site of these enzymes, the incoming nucleoside 
triphosphate base-pairs to the template: this allows polymerases to accurately synthesize 
the complementary strand of their template. Polymerases are classified according to the 
type of template that they use. 

In DNA replication, a DNA-dependent DNA polymerase makes a copy of a DNA sequence. 
Accuracy is vital in this process, so many of these polymerases have a proofreading activity. 
Here, the polymerase recognizes the occasional mistakes in the synthesis reaction by the 
lack of base pairing between the mismatched nucleotides. If a mismatch is detected, a 3' to 
5' exonuclease activity is activated and the incorrect base removed. In most organisms 
DNA polymerases function in a large complex called the replisome that contains multiple 

T921 

accessory subunits, such as the DNA clamp or helicases. 

RNA-dependent DNA polymerases are a specialized class of polymerases that copy the 
sequence of an RNA strand into DNA. They include reverse transcriptase, which is a viral 
enzyme involved in the infection of cells by retroviruses, and telomerase, which is required 
for the replication of telomeres. Telomerase is an unusual polymerase because it 

contains its own RNA template as part of its structure. 

Transcription is carried out by a DNA-dependent RNA polymerase that copies the sequence 
of a DNA strand into RNA. To begin transcribing a gene, the RNA polymerase binds to a 
sequence of DNA called a promoter and separates the DNA strands. It then copies the gene 
sequence into a messenger RNA transcript until it reaches a region of DNA called the 
terminator, where it halts and detaches from the DNA. As with human DNA-dependent DNA 
polymerases, RNA polymerase II, the enzyme that transcribes most of the genes in the 
human genome, operates as part of a large protein complex with multiple regulatory and 
accessory subunits. 94] 



Genetic recombination 
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Structure of the Holliday junction intermediate in genetic recombination. The four separate 
DNA strands are coloured red, blue, green and yellow. ' 

A DNA helix usually does not interact with 
other segments of DNA, and in human cells 
the different chromosomes even occupy 
separate areas in the nucleus called 
"chromosome territories". ' This physical 
separation of different chromosomes is 
important for the ability of DNA to function 
as a stable repository for information, as 
one of the few times chromosomes interact 
is during chromosomal crossover when 
they recombine. Chromosomal crossover is 
when two DNA helices break, swap a 
section and then rejoin. 

Recombination allows chromosomes to exchange genetic information and produces new 
combinations of genes, which increases the efficiency of natural selection and can be 

T971 

important in the rapid evolution of new proteins. Genetic recombination can also be 
involved in DNA repair, particularly in the cell's response to double-strand breaks. 

The most common form of chromosomal crossover is homologous recombination, where the 
two chromosomes involved share very similar sequences. Non-homologous recombination 
can be damaging to cells, as it can produce chromosomal translocations and genetic 
abnormalities. The recombination reaction is catalyzed by enzymes known as recombinases, 

T991 

such as RAD51. The first step in recombination is a double-stranded break either caused 
by an endonuclease or damage to the DNA. A series of steps catalyzed in part by the 

recombinase then leads to joining of the two helices by at least one Holliday junction, in 
which a segment of a single strand in each helix is annealed to the complementary strand in 
the other helix. The Holliday junction is a tetrahedral junction structure that can be moved 
along the pair of chromosomes, swapping one strand for another. The recombination 
reaction is then halted by cleavage of the junction and re-ligation of the released DNA. ' 
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Evolution 

DNA contains the genetic information that allows all modern living things to function, grow 
and reproduce. However, it is unclear how long in the 4-billion-year history of life DNA has 
performed this function, as it has been proposed that the earliest forms of life may have 
used RNA as their genetic material. RNA may have acted as the central part of early 

cell metabolism as it can both transmit genetic information and carry out catalysis as part 
of ribozymes. This ancient RNA world where nucleic acid would have been used for 

both catalysis and genetics may have influenced the evolution of the current genetic code 
based on four nucleotide bases. This would occur since the number of unique bases in such 
an organism is a trade-off between a small number of bases increasing replication accuracy 
and a large number of bases increasing the catalytic efficiency of ribozymes. 

Unfortunately, there is no direct evidence of ancient genetic systems, as recovery of DNA 
from most fossils is impossible. This is because DNA will survive in the environment for less 
than one million years and slowly degrades into short fragments in solution. Claims for 

older DNA have been made, most notably a report of the isolation of a viable bacterium 
from a salt crystal 250-million years old, but these claims are controversial. 

Uses in technology 

Genetic engineering 

Methods have been developed to purify DNA from organisms, such as phenol-chloroform 
extraction and manipulate it in the laboratory, such as restriction digests and the 
polymerase chain reaction. Modern biology and biochemistry make intensive use of these 
techniques in recombinant DNA technology. Recombinant DNA is a man-made DNA 
sequence that has been assembled from other DNA sequences. They can be transformed 
into organisms in the form of plasmids or in the appropriate format, by using a viral 
vector. The genetically modified organisms produced can be used to produce products 

such as recombinant proteins, used in medical research, or be grown in agriculture. 

[112] 

Forensics 

Forensic scientists can use DNA in blood, semen, skin, saliva or hair found at a crime scene 
to identify a matching DNA of an individual, such as a perpetrator. This process is called 
genetic fingerprinting, or more accurately, DNA profiling. In DNA profiling, the lengths of 
variable sections of repetitive DNA, such as short tandem repeats and minisatellites, are 
compared between people. This method is usually an extremely reliable technique for 
identifying a matching DNA. However, identification can be complicated if the scene is 

contaminated with DNA from several people. DNA profiling was developed in 1984 by 

British geneticist Sir Alec Jeffreys, and first used in forensic science to convict Colin 

Pitchfork in the 1988 Enderby murders case. 

People convicted of certain types of crimes may be required to provide a sample of DNA for 
a database. This has helped investigators solve old cases where only a DNA sample was 
obtained from the scene. DNA profiling can also be used to identify victims of mass casualty 
incidents. On the other hand, many convicted people have been released from prison on 

the basis of DNA techniques, which were not available when a crime had originally been 
committed. 
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Bioinformatics 

-» Bioinformatics involves the manipulation, searching, and data mining of DNA sequence 
data. The development of techniques to store and search DNA sequences have led to widely 
applied advances in computer science, especially string searching algorithms, machine 
learning and database theory. String searching or matching algorithms, which find an 

occurrence of a sequence of letters inside a larger sequence of letters, were developed to 
search for specific sequences of nucleotides. In other applications such as text editors, 

even simple algorithms for this problem usually suffice, but DNA sequences cause these 
algorithms to exhibit near-worst-case behaviour due to their small number of distinct 
characters. The related problem of sequence alignment aims to identify homologous 
sequences and locate the specific mutations that make them distinct. These techniques, 
especially multiple sequence alignment, are used in studying phylogenetic relationships and 
protein function. Data sets representing entire genomes' worth of DNA sequences, such 

as those produced by the Human Genome Project, are difficult to use without annotations, 
which label the locations of genes and regulatory elements on each chromosome. Regions 
of DNA sequence that have the characteristic patterns associated with protein- or 
RNA-coding genes can be identified by gene finding algorithms, which allow researchers to 
predict the presence of particular gene products in an organism even before they have been 
isolated experimentally. ' 



DNA nanotechnology 

DNA nanotechnology uses the 
unique molecular recognition 
properties of DNA and other 
nucleic acids to create 
self-assembling branched DNA 
complexes with useful 

properties. 1 J DNA is thus 
used as a structural material 
rather than as a carrier of 
biological information. This 
has led to the creation of 
two-dimensional periodic 

lattices (both tile-based as well 
as using the "DNA origami" 
method) as well as 
three-dimensional structures 
in the shapes of 











100 nm 



The DNA structure at left (schematic shown) will self-assemble into 

the structure visualized by atomic force microscopy at right. DNA 

nanotechnology is the field which seeks to design nanoscale structures 

using the molecular recognition properties of DNA molecules. Image 

from Strong, 2004. [122] 



polyhedra. Nanomechanical devices and algorithmic self-assembly have also been 

demonstrated, and these DNA structures have been used to template the arrangement 



of other molecules such as gold nanoparticles and streptavidin proteins. 



[126] 



History and anthropology 

Because DNA collects mutations over time, which are then inherited, it contains historical 
information and by comparing DNA sequences, geneticists can infer the evolutionary 
history of organisms, their phylogeny. This field of phylogenetics is a powerful tool in 



DNA 



19 



evolutionary biology. If DNA sequences within a species are compared, population 
geneticists can learn the history of particular populations. This can be used in studies 

ranging from ecological genetics to anthropology; for example, DNA evidence is being used 

ri28i ri29i 
to try to identify the Ten Lost Tribes of Israel. 

DNA has also been used to look at modern family relationships, such as establishing family 
relationships between the descendants of Sally Hemings and Thomas Jefferson. This usage 
is closely related to the use of DNA in criminal investigations detailed above. Indeed, some 
criminal investigations have been solved when DNA from crime scenes has matched 
relatives of the guilty individual. 



History of DNA research 

DNA was first isolated by the Swiss physician Friedrich Miescher who, in 1869, discovered 
a microscopic substance in the pus of discarded surgical bandages. As it resided in the 
nuclei of cells, he called it "nuclein". In 1919, Phoebus Levene identified the base, 

sugar and phosphate nucleotide unit. Levene suggested that DNA consisted of a string 

of nucleotide units linked together through the phosphate groups. However, Levene 
thought the chain was short and the bases repeated in a fixed order. In 1937 William 
Astbury produced the first X-ray diffraction patterns that showed that DNA had a regular 
structure. 

In 1928, Frederick Griffith discovered that traits of the "smooth" form of the Pneumococcus 
could be transferred to the "rough" form of the same bacteria by mixing killed "smooth" 
bacteria with the live "rough" form. This system provided the first clear suggestion that 

DNA carried genetic information— the Avery-MacLeod-McCarty experiment— when Oswald 
Avery, along with coworkers Colin MacLeod and Maclyn McCarty, identified DNA as the 
transforming principle in 1943. DNA's role in heredity was confirmed in 1952, when 

Alfred Hershey and Martha Chase in the Hershey-Chase experiment showed that DNA is 
the genetic material of the T2 phage. 
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DNA Helix controversy 



In 1953 James D. Watson and Francis Crick suggested what is now accepted as the first 
correct double-helix model of DNA structure in the journal Nature. Their double-helix, 
molecular model of DNA was then based on a single X-ray diffraction image (labeled as 
"Photo 51") taken by Rosalind Franklin and Raymond Gosling in May 1952, as well as 

the information that the DNA bases were paired— also obtained through private 
communications from Erwin Chargaff in the previous years. Chargaff's rules played a very 
important role in establishing double-helix configurations for B-DNA as well as A-DNA. 

Experimental evidence supporting the Watson and Crick model were published in a series 
of five articles in the same issue of Nature. Of these, Franklin and Gosling's paper was 

the first publication of their own X-ray diffraction data and original analysis method that 

["2Q1 [1391 

partially supported the Watson and Crick model ; this issue also contained an article 

on DNA structure by Maurice Wilkins and two of his colleagues, whose analysis and in vivo 
B-DNA X-ray patterns also supported the presence in vivo of the double-helical DNA 
configurations as proposed by Crick and Watson for their double-helix molecular model of 
DNA in the previous two pages of Nature. In 1962, after Franklin's death, Watson, Crick, 
and Wilkins jointly received the Nobel Prize in Physiology or Medicine. Unfortunately, 

Nobel rules of the time allowed only living recipients, but a vigorous debate continues on 
who should receive credit for the discovery. 

In an influential presentation in 1957, Crick laid out the "Central Dogma" of molecular 
biology, which foretold the relationship between DNA, RNA, and proteins, and articulated 
the "adaptor hypothesis". Final confirmation of the replication mechanism that was 

implied by the double-helical structure followed in 1958 through the Meselson-Stahl 
experiment. Further work by Crick and coworkers showed that the genetic code was 

based on non-overlapping triplets of bases, called codons, allowing Har Gobind Khorana, 



Robert W. Holley and Marshall Warren Nirenberg to decipher the genetic code, 
findings represent the birth of molecular biology. 



[144] 
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with foreword by Francis Crick;the definitive DNA textbook,revised in 1994 with a 9 page 
postscript. 

• Olby, Robert C. (2009). Francis Crick: A Biography. Plainview, N.Y: Cold Spring Harbor 
Laboratory Press. ISBN 0-87969-798-9. 

• Ridley, Matt (2006). Francis Crick: discoverer of the genetic code. [Ashland, OH: Eminent 
Lives, Atlas Books. ISBN 0-06-082333-X. 
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of the discovery of the structure of DNA. New York: Norton. ISBN 0-393-95075-1. 

• Wilkins, Maurice (2003). The third man of the double helix the autobiography of Maurice 
Wilkins. Cambridge, Eng: University Press. ISBN 0-19-860665-6. 

External links 

• DNA (http://www.dmoz.org/Science/Biology/Biochemistry_and_Molecular_Biology/ 
Biomolecules/Nucleic_Acids/DNA//) at the Open Directory Project 

• DNA binding site prediction on protein (http://pipe.scs.fsu.edu/displar.html) 

• DNA coiling to form chromosomes (http://biostudio.com/c_ education mac. htm) 

• DNA from the Beginning (http://www.dnaftb.org/dnaftb/) Another DNA Learning 
Center site on DNA, genes, and heredity from Mendel to the human genome project. 

• DNA Lab, demonstrates how to extract DNA from wheat using readily available 
equipment and supplies. (http://ca.youtube.com/watch?v=iyb7fwduuGM) 

• DNA the Double Helix Game (http://nobelprize.org/educational_games/medicine/ 
dna_double_helix/) From the official Nobel Prize web site 

• DNA under electron microscope (http://www.fidelitysystems.com/Unlinked_DNA.html) 

• Dolan DNA Learning Center (http://www.dnalc.org/) 

• Double Helix: 50 years of DNA (http://www.nature.com/nature/dna50/archive.html), 
Nature 

• Double Helix 1953-2003 (http://www.ncbe.reading.ac.uk/DNA50/) National Centre 
for Biotechnology Education 
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• Francis Crick and James Watson talking on the BBC in 1962, 1972, and 1974 (http:// 
www.bbc.co.uk/bbcfour/audiointerviews/profilepages/crickwatsonl.shtml) 

• Genetic Education Modules for Teachers (http://www.genome.gov/10506718) — DNA 
from the Beginning Study Guide 

• Guide to DNA cloning (http://www.blackwellpublishing.com/trun/artwork/Animations/ 
cloningexp/cloni ngexp.html) 

• Olby R (January 2003). " Quiet debut for the double helix (http://chem-faculty.ucsd.edu/ 
joseph/CHEM13/DNAl.pdf)". Nature 421 (6921): 402-5. doi: 10.1038/nature01397 
(http://dx.doi.org/10.1038/nature01397). PMID 12540907. http://chem-faculty.ucsd. 
edu/joseph/CHEM13/DNAl.pdf. 

• PDB Molecule of the Month pdb23_l (http://www.rcsb.org/pdb/static. 
do?p=education_discussion/molecule_of_the_month/pdb23 _l.html) 

• Rosalind Franklin's contributions to the study of DNA (http://mason.gmu.edu/ 
-emoody/ rfranklin.htm I) 

• The Register of Francis Crick Personal Papers 1938 - 2007 (http://orpheus.ucsd.edu/ 
speccoll/testing/html/mss0660a.html#abstract) at Mandeville Special Collections 
Library, Geisel Library, University of California, San Diego 

• U.S. National DNA Day (http://www.genome.gov/10506367) — watch videos and 
participate in real-time chat with top scientists 

• " Clue to chemistry of heredity found (http://www.nytimes.com/packages/pdf/science/ 
dna-article.pdf)". The New York Times. Saturday, June 13, 1953. http://www.nytimes. 
com/packages/pdf/science/dna-article.pdf. The first American newspaper coverage of 
the discovery of the DNA structure. 
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Molecular models of DNA 

Molecular models of DNA structures are representations of the molecular geometry and 
topology of Deoxyribonucleic acid (-» DNA) molecules using one of several means, such as: 
closely packed spheres (CPK models) made of plastic, metal wires for 'skeletal models', 
graphic computations and animations by computers, artistic rendering, and so on, with the 
aim of simplifying and presenting the essential, physical and chemical, properties of DNA 
molecular structures either in vivo or in vitro. Computer molecular models also allow 
animations and molecular dynamics simulations that are very important for understanding 
how DNA functions in vivo. Thus, an old standing dynamic problem is how DNA 
"self-replication" takes place in living cells that should involve transient uncoiling of 
supercoiled DNA fibers. Although DNA consists of relatively rigid, very large elongated 
biopolymer molecules called "fibers" or chains (that are made of repeating nucleotide units 
of four basic types, attached to deoxyribose and phosphate groups), its molecular structure 
in vivo undergoes dynamic configuration changes that involve dynamically attached water 
molecules and ions. Supercoiling, packing with histones in chromosome structures, and 
other such supramolecular aspects also involve in vivo DNA topology which is even more 
complex than DNA molecular geometry, thus turning molecular modeling of DNA into an 
especially challenging problem for both molecular biologists and biotechnologists. Like 
other large molecules and biopolymers, DNA often exists in multiple stable geometries (that 
is, it exhibits conformational isomerism) and configurational, quantum states which are 
close to each other in energy on the potential energy surface of the DNA molecule. Such 
geometries can also be computed, at least in principle, by employing ab initio quantum 
chemistry methods that have high accuracy for small molecules. Such quantum geometries 
define an important class of ab initio molecular models of DNA whose exploration has 
barely started. 

In an interesting twist of roles, the DNA molecule itself was proposed to 
be utilized for quantum computing. Both DNA nanostructures as well as 
DNA 'computing' biochips have been built (see biochip image at right). 

The more advanced, computer-based molecular models of DNA involve 
molecular dynamics simulations as well as quantum mechanical 
computations of vibro-rotations, delocalized molecular orbitals (MOs), 
electric dipole moments, hydrogen-bonding, and so on. 




DNA computing 
biochip :3D 
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Importance 

From the very early stages of structural studies of DNA by X-ray 
diffraction and biochemical means, molecular models such as the 
Watson-Crick double-helix model were successfully employed to solve the 
'puzzle' of DNA structure, and also find how the latter relates to its key 
functions in living cells. The first high quality X-ray diffraction patterns 
of A-DNA were reported by Rosalind Franklin and Raymond Gosling in 
1953 . The first calculations of the Fourier transform of an atomic helix 
were reported one year earlier by Cochran, Crick and Vand , and were 
followed in 1953 by the computation of the Fourier transform of a 
coiled-coil by Crick [ ] . The first reports of a double-helix molecular 
model of B-DNA structure were made by Watson and Crick in 1953 . 

Last-but-not-least, Maurice F. Wilkins, A. Stokes and H.R. Wilson, 
reported the first X-ray patterns of in vivo B-DNA in partially oriented 
salmon sperm heads [ ] . The development of the first correct 
double-helix molecular model of DNA by Crick and Watson may not have 

been possible without the biochemical evidence for the nucleotide base-pairing ([A— T]; 

[C-G]), or Chargaff's rules [6] [7] [8] [9] [10] [11] . 




Spinning DNA 
generic model. 



Examples of DNA molecular models 

Animated molecular models allow one to visually explore the three-dimensional (3D) 
structure of DNA. The first DNA model is a space-filling, or CPK, model of the DNA 
double-helix whereas the third is an animated wire, or skeletal type, molecular model of 
DNA. The last two DNA molecular models in this series depict quadruplex DNA that 

may be involved in certain cancers . The last figure on this panel is a molecular 

model of hydrogen bonds between water molecules in ice that are similar to those found in 
DNA. 
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Hydrogen 
bonds 




• Spacefilling model or CPK model - a molecule is represented by overlapping spheres 
representing the atoms. 




Images for DNA Structure Determination from X-Ray 
Patterns 

The following images illustrate both the principles and the main steps involved in 
generating structural information from X-ray diffraction studies of oriented DNA fibers with 
the help of molecular models of DNA that are combined with crystallographic and 
mathematical analysis of the X-ray patterns. From left to right the gallery of images shows: 

• First row. 

• 1. Constructive X-ray interference, or diffraction, following Bragg's Law of X-ray 
"reflection by the crystal planes"; 

• 2. A comparison of A-DNA (crystalline) and highly hydrated B-DNA (paracrystalline) X-ray 
diffraction, and respectively, X-ray scattering patterns (courtesy of Dr. Herbert R. Wilson, 
FRS- see refs. list); 

• 3. Purified DNA precipitated in a water jug; 

• 4. The major steps involved in DNA structure determination by X-ray crystallography 
showing the important role played by molecular models of DNA structure in this iterative, 
structure-determination process; 

• Second row: 

• 5. Photo of a modern X-ray diffractometer employed for recording X-ray patterns of DNA 
with major components: X-ray source, goniometer, sample holder, X-ray detector and/or 
plate holder; 

• 6. Illustrated animation of an X-ray goniometer; 

• 7. X-ray detector at the SLAC synchrotron facility; 

• 8. Neutron scattering facility at ISIS in UK; 

• Third and fourth rows: Molecular models of DNA structure at various scales; figure 
#11 is an actual electron micrograph of a DNA fiber bundle, presumably of a single 
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Paracrystalline lattice models of B-DNA structures 

A paracrystalline lattice, or paracrystal, is a molecular or atomic lattice with significant 
amounts (e.g., larger than a few percent) of partial disordering of molecular 
arranegements. Limiting cases of the paracrystal model are nanostructures, such as 
glasses, liquids, etc., that may possess only local ordering and no global order. Liquid 
crystals also have paracrystalline rather than crystalline structures. 
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DNA Helix controversy in 1952 
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Highly hydrated B-DNA occurs naturally in living cells in such a paracrystalline state, which 
is a dynamic one in spite of the relatively rigid DNA double-helix stabilized by parallel 
hydrogen bonds between the nucleotide base-pairs in the two complementary, helical DNA 
chains (see figures). For simplicity most DNA molecular models ommit both water and ions 
dynamically bound to B-DNA, and are thus less useful for understanding the dynamic 
behaviors of B-DNA in vivo. The physical and mathematical analysis of X-ray and 

spectroscopic data for paracrystalline B-DNA is therefore much more complicated than that 
of crystalline, A-DNA X-ray diffraction patterns. The paracrystal model is also important for 
DNA technological applications such as DNA nanotechnology. Novel techniques that 
combine X-ray diffraction of DNA with X-ray microscopy in hydrated living cells are now 
also being developed (see, for example, "Application of X-ray microscopy in the analysis of 
living hydrated cells" ). 

Genomic and Biotechnology Applications of DNA molecular 
modeling 

The following gallery of images illustrates various uses of DNA molecular modeling in 
Genomics and Biotechnology research applications from DNA repair to PCR and DNA 
nanostructures; each slide contains its own explanation and/or details. The first slide 
presents an overview of DNA applications, including DNA molecular models, with emphasis 
on Genomics and Biotechnology. 

Gallery: DNA Molecular modeling applications 
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Databases for DNA molecular models and sequences 



X-ray diffraction 

• NDB ID: UD0017 Database [18] 

riQi 

• X-ray Atlas -database 

• PDB files of coordinates for nucleic acid structures from X-ray diffraction by NA (incl. 



DNA) crystals 



[20] 



• Structure factors dowloadable files in CIF format 
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Neutron scattering 

• ISIS neutron source 

• ISIS pulsed neutron source:A world centre for science with neutrons & muons at 
Harwell, near Oxford, UK. [22] 



X-ray microscopy 

• Application of X-ray microscopy in the analysis of living hydrated cells 



[23] 



Electron microscopy 

• DNA under electron microscope 



[24] 



Atomic Force Microscopy (AFM) 

Two-dimensional DNA junction arrays have been visualized by Atomic Force Microscopy 

T251 

(AFM) . Other imaging resources for AFM/Scanning probe microscopy(SPM) can be 
freely accessed at: 

• How SPM Works [26] 

• SPM Image Gallery - AFM STM SEM MFM NSOM and more. [27] 

Gallery of AFM Images 
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Mass spectrometry—Maldi informatics 



Data acquisition 



I List of peak 
I masses 




Peak detection 




_ 5 J List of peak 
^n intensities 




Genotype, 
mutations, etc. 
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Spectroscopy 

• Vibrational circular dichroism (VCD) 

• FT-NMR [28] [29] 

• NMR Atlas-database [30] 

• mmcif downloadable coordinate files of nucleic acids in solution from 2D-FT NMR data 

[31] 

• NMR constraints files for NAs in PDB format [32] 
NMR microscopy [33] 
Microwave spectroscopy 
FT-IR 

FT-NIR [34] [35] [36] 

Spectral, Hyperspectral, and Chemical imaging) [37] [38] [39] [40] [41] [42] [43] . 
Raman spectroscopy/microscopy and CARS 

Fluorescence correlation spectroscopy , Fluorescence 

cross-correlation spectroscopy and FRET ' *■ ' . 



Confocal microscopy 



[57] 
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Gallery: CARS (Raman spectroscopy), Fluorescence confocal 
microscopy, and Hyperspectral imaging 
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Genomic and structural databases 

• CBS Genome Atlas Database — contains examples of base skews. 

• The Z curve database of genomes — a 3-dimensional visualization and analysis tool of 
genomes [60][61] . 

• DNA and other nucleic acids' molecular models: Coordinate files of nucleic acids 
molecular structure models in PDB and CIF formats 
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External links 

• DNA the Double Helix Game (http://nobelprize.org/educational_games/medicine/ 
dna_double_helix/) From the official Nobel Prize web site 

• MDDNA: Structural Bioinformatics of DNA (http://humphry.chem. wesleyan.edu:8080/ 
MDDNA/) 

• Double Helix 1953-2003 (http://www.ncbe.reading.ac.uk/DNA50/) National Centre 
for Biotechnology Education 

• DNA under electron microscope (http://www.fidelitysystems.com/Unlinked_DNA.html) 

• Ascalaph DNA (http://www.agilemolecule.com/Ascalaph/Ascalaph_DNA.html) — 
Commercial software for DNA modeling 

• DNAlive: a web interface to compute DNA physical properties (http://mmb.pcb.ub.es/ 
DNAIive). Also allows cross-linking of the results with the UCSC Genome browser and 
DNA dynamics. 

• DiProDB: Dinucleotide Property Database (http://diprodb.fli-leibniz.de). The database is 
designed to collect and analyse thermodynamic, structural and other dinucleotide 
properties. 

• Further details of mathematical and molecular analysis of DNA structure based on X-ray 
data (http://planetphysics.org/encyclopedia/ 
BesselFunctionsApplicationsToDiffractionByHelicalStructures.html) 

• Bessel functions corresponding to Fourier transforms of atomic or molecular helices. 
(http://planetphysics.org/?op=getobj&from=objects& 

name= Bessel FunctionsAndTheirApplicationsToDiffractionByHelicalStructu res) 

• Application of X-ray microscopy in analysis of living hydrated cells (http://www.ncbi. 
nlm.nih.gov/entrez/query.fcgi?cmd = Retrieve&db=pubmed&dopt= Abstracts* 
list_uids=12379938) 

• Characterization in nanotechnology some pdfs (http://nanocharacterization.sitesled. 
com/) 

• overview of STM/AFM/SNOM principles with educative videos (http://www.ntmdt.ru/ 
SPM-Techniques/Principles/) 

• SPM Image Gallery - AFM STM SEM MFM NSOM and More (http://www.rhk-tech.com/ 
resu Its/showcase, php) 

• How SPM Works (http://www.parkafm.com/New_html/resources/01general.php) 

• U.S. National DNA Day (http://www.genome.gov/10506367) — watch videos and 
participate in real-time discusssions with scientists. 

• The Secret Life of DNA - DNA Music compositions (http://www.tjmitchell.com/stuart/ 
dna.html) 
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Genomics 

Genomics is the study of the genomes of organisms. The field includes intensive efforts to 
determine the entire DNA seguence of organisms and fine-scale genetic mapping efforts. 
The field also includes studies of intragenomic phenomena such as heterosis, epistasis, 
pleiotropy and other interactions between loci and alleles within the genome. In contrast 
the investigation of the roles and functions of single genes is a primary focus of molecular 
biology and is a common topic of modern medical and biological research. Research of 
single genes does not fall into the definition of genomics unless the aim of this genetic, 
pathway, and functional information analysis is to elucidate its effect on, place in, and 
response to the entire genome's networks. 

For the United States Environmental Protection Agency, "the term "genomics" 
encompasses a broader scope of scientific inguiry associated technologies than when 
genomics was initially considered. A genome is the sum total of all an individual organism's 
genes. Thus, genomics is the study of all the genes of a cell, or tissue, at the DNA 
(genotype), mRNA (transcriptome), or protein (proteome) levels." 

History 

Genomics was established by Tattersol Smith when he first seguenced the complete 
genomes of a virus and a mitochondrion. His group established technigues of seguencing, 
genome mapping, data storage, and bioinformatic analyses in the 1970-1 980s. A major 
branch of genomics is still concerned with seguencing the genomes of various organisms, 
but the knowledge of full genomes has created the possibility for the field of functional 
genomics, mainly concerned with patterns of gene expression during various conditions. 
The most important tools here are microarrays and -» bioinformatics. Study of the full set of 
proteins in a cell type or tissue, and the changes during various conditions, is called -» 
proteomics. A related concept is materiomics, which is defined as the study of the material 
properties of biological materials (e.g. hierarchical protein structures and materials, 
mineralized biological tissues, etc.) and their effect on the macroscopic function and failure 
in their biological context, linking processes, structure and properties at multiple scales 
through a materials science approach. The actual term 'genomics' is thought to have been 
coined by Dr. Tom Roderick, a geneticist at the Jackson Laboratory (Bar Harbor, ME) over 
beer at a meeting held in Maryland on the mapping of the human genome in 1986. 

In 1972, Walter Fiers and his team at the Laboratory of Molecular Biology of the University 
of Ghent (Ghent, Belgium) were the first to determine the seguence of a gene: the gene for 
Bacteriophage MS2 coat protein. In 1976, the team determined the complete 
nucleotide-seguence of bacteriophage MS2-RNA. The first DNA-based genome to be 
seguenced in its entirety was that of bacteriophage 0-X174; (5,368 bp), seguenced by 
Frederick Sanger in 1977. 

The first free-living organism to be seguenced was that of Haemophilus influenzae (1.8 Mb) 
in 1995, and since then genomes are being seguenced at a rapid pace. A rough draft of the 
human genome was completed by the Human Genome Project in early 2001, creating much 
fanfare. 

As of September 2007, the complete seguence was known of about 1879 viruses , 577 
bacterial species and roughly 23 eukaryote organisms, of which about half are fungi. 
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Most of the bacteria whose genomes have been completely sequenced are problematic 
disease-causing agents, such as Haemophilus influenzae. Of the other sequenced species, 
most were chosen because they were well-studied model organisms or promised to become 
good models. Yeast (Saccharomyces cerevisiae) has long been an important model 
organism for the eukaryotic cell, while the fruit fly Drosophila melanogaster has been a 
very important tool (notably in early pre-molecular genetics). The worm Caenorhabditis 
elegans is an often used simple model for multicellular organisms. The zebrafish 
Brachydanio rerio is used for many developmental studies on the molecular level and the 
flower Arabidopsis thaliana is a model organism for flowering plants. The Japanese 
pufferfish (Takifugu rubripes) and the spotted green pufferfish (Tetraodon nigroviridis) are 
interesting because of their small and compact genomes, containing very little non-coding 
DNA compared to most species. The mammals dog (Canis familiaris), brown rat 

{Rattus norvegicus), mouse {Mus musculus), and chimpanzee (Pan troglodytes) are all 
important model animals in medical research. 

Bacteriophage genomics 

Bacteriophages have played and continue to play a key role in bacterial genetics and 
molecular biology. Historically, they were used to define gene structure and gene 
regulation. Also the first genome to be sequenced was a bacteriophage. However, 
bacteriophage research did not lead the genomics revolution, which is clearly dominated by 
bacterial genomics. Only very recently has the study of bacteriophage genomes become 
prominent, thereby enabling researchers to understand the mechanisms underlying phage 
evolution. Bacteriophage genome sequences can be obtained through direct sequencing of 
isolated bacteriophages, but can also be derived as part of microbial genomes. Analysis of 
bacterial genomes has shown that a substantial amount of microbial DNA consists of 
prophage sequences and prophage-like elements. A detailed database mining of these 
sequences offers insights into the role of prophages in shaping the bacterial genome. 

Cyanobacteria genomics 

At present there are 24 cyanobacteria for which a total genome sequence is available. 15 of 
these cyanobacteria come from the marine environment. These are six Prochlorococcus 
strains, seven marine Synechococcus strains, Trichodesmium erythraeum IMS101 and 
Crocosphaera watsonii WH8501. Several studies have demonstrated how these sequences 
could be used very successfully to infer important ecological and physiological 
characteristics of marine cyanobacteria. However, there are many more genome projects 
currently in progress, amongst those there are further Prochlorococcus and marine 
Synechococcus isolates, Acaryochloris and Prochloron, the N„-fixing filamentous 
cyanobacteria Nodularia spumigena, Lyngbya aestuarii and Lyngbya majuscula, as well as 
bacteriophages infecting marine cyanobaceria. Thus, the growing body of genome 
information can also be tapped in a more general way to address global problems by 
applying a comparative approach. Some new and exciting examples of progress in this field 
are the identification of genes for regulatory RNAs, insights into the evolutionary origin of 
photosynthesis, or estimation of the contribution of horizontal gene transfer to the genomes 
that have been analyzed. 
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See also 

• Full Genome Sequencing 

• Computational genomics 

• Nitrogenomics 

• Metagenomics 

• Predictive Medicine 

• Personal genomics 
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External links 

• Genomics Directory (http://www.genomicsdirectory.com): A one-stop biotechnology 
resource center for bioentrepreneurs, scientists, and students 

• Annual Review of Genomics and Human Genetics (http://arjournals.annualreviews.org/ 
loi/genom/) 

• BMC Genomics (http://www.biomedcentral.com/bmcgenomics/): A BMC journal on 
Genomics 

• Genomics (http://www.genomics.co.uk/companylist.php): UK companies and 
laboratories* Genomics journal (http://www.elsevier.com/wps/find/journaldescription. 
cws_home/622838/description#description) 

• Genomics.org (http://genomics.org): An openfree wiki based Genomics portal 

• NHGRI (http://www.genome.gov/): US government's genome institute 

• Pharmacogenomics in Drug Discovery and Development (http://www.springer.com/ 
humana + press/pharmacology+and+toxicology/book/978-1-58829-887-4), a book on 
pharmacogenomics, diseases, personalized medicine, and therapeutics 

• Tishchenko P. D. Genomics: New Science in the New Cultural Situation (http://www. 
zpu-journal.ru/en/articles/detail.php?ID=342) 



Genomics 51 

• Undergraduate program on Genomic Sciences (Spanish) (http://www.lcg.unam.mx/): 
One of the first undergraduate programs in the world 

• JCVI Comprehensive Microbial Resource (http://cmr.jcvi.org/) 

• Pathema: A Clade Specific Bioinformatics Resource Center (http://pathema.jcvi.org/) 

• KoreaGenome.org (http://koreagenome.org): The first Korean Genome published and 
the sequence is available freely. 

• GenomicsNetwork (http://genomicsnetwork.ac.uk): Looks at the development and use 
of the science and technologies of genomics. 
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Protein Interactions 



Proteomics 



Proteomics is the large-scale 

study of proteins, particularly their 

structures and functions. 

Proteins are vital parts of living 

organisms, as they are the main 

components of the physiological 

metabolic pathways of cells. The 

term "proteomics" was first coined 

in 1997 l ' to make an analogy with 

-» genomics, the study of the 

genes. The word "proteome" is a 

blend of "protein" and "genome", 

and was coined by Prof Marc 

Wilkins in 1994 while working on 

the concept as a PhD student. 

The proteome is the entire 

complement of proteins, 

including the modifications made to a particular set of proteins, produced by an organism 

or system. This will vary with time and distinct requirements, or stresses, that a cell or 

organism undergoes. 




Robotic preparation of MALDI mass spectrometry samples on a 
sample carrier. 



Complexity of the Problem 

After genomics, proteomics is often considered the next step in the study of biological 
systems. It is much more complicated than genomics mostly because while an organism's 
genome is more or less constant, the proteome differs from cell to cell and from time to 
time. This is because distinct genes are expressed in distinct cell types. This means that 
even the basic set of proteins which are produced in a cell needs to be determined. 

In the past this was done by mRNA analysis, but this was found not to correlate with 
protein content. It is now known that mRNA is not always translated into protein, 

and the amount of protein produced for a given amount of mRNA depends on the gene it is 
transcribed from and on the current physiological state of the cell. Proteomics confirms the 
presence of the protein and provides a direct measure of the quantity present. 
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Examples of post-translational modifications 

Phosphorylation 

More importantly though, any particular protein may go through a wide variety of 
alterations which will have critical effects to its function. For example during cell signaling 
many enzymes and structural proteins can undergo phosphorylation. The addition of a 

rq] 

phosphate to particular amino acids— most commonly serine and threonine mediated by 
serine/threonine kinases, or more rarely tyrosine mediated by tyrosine kinases— causes a 
protein to become a target for binding or interacting with a distinct set of other proteins 
that recognize the phosphorylated domain. 

Because protein phosphorylation is one of the most-studied protein modifications many 
"proteomic" efforts are geared to determining the set of phosphorylated proteins in a 
particular cell or tissue-type under particular circumstances. This alerts the scientist to the 
signaling pathways that may be active in that instance. 

Ubiquitination 

Ubiquitin is a small protein that can be affixed to certain protein substrates by enzymes 
called E3 ubiquitin ligases. Determining which proteins are poly-ubiquitinated can be 
helpful in understanding how protein pathways are regulated. This is therefore an 
additional legitimate "proteomic" study. Similarly, once it is determined what substrates are 
ubiquitinated by each ligase, determining the set of ligases expressed in a particular cell 
type will be helpful. 

Additional modifications 

Listing all the protein modifications that might be studied in a "Proteomics" project would 
require a discussion of most of biochemistry; therefore, a short list will serve here to 
illustrate the complexity of the problem. In addition to phosphorylation and ubiquitination, 
proteins can be subjected to methylation, acetylation, glycosylation, oxidation, nitrosylation, 
etc. Some proteins undergo ALL of these modifications, which nicely illustrates the 
potential complexity one has to deal with when studying protein structure and function. 

Distinct proteins are made under distinct settings 

Even if one is studying a particular cell type, that cell may make different sets of proteins at 
different times, or under different conditions. Furthermore, as mentioned, any one protein 
can undergo a wide range of post-translational modifications. 

Therefore a "proteomics" study can become quite complex very quickly, even if the object of 
the study is very restricted. In more ambitious settings, such as when a biomarker for a 
tumor is sought - when the proteomics scientist is obliged to study sera samples from 
multiple cancer patients - the amount of complexity that must be dealt with is as great as in 
any modern biological project. 
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Rationale for proteomics 

The key requirement in understanding protein function is to learn to correlate the vast 
array of potential protein modifications to particular phenotypic settings, and then 
determine if a particular post-translational modification is required for a function to occur. 

Limitations to genomic study 

Scientists are very interested in proteomics because it gives a much better understanding 
of an organism than genomics. First, the level of transcription of a gene gives only a rough 
estimate of its level of expression into a protein. An mRNA produced in abundance may be 
degraded rapidly or translated inefficiently, resulting in a small amount of protein. Second, 
as mentioned above many proteins experience post-translational modifications that 
profoundly affect their activities; for example some proteins are not active until they 
become phosphorylated. Methods such as phosphoproteomics and glycoproteomics are 
used to study post-translational modifications. Third, many transcripts give rise to more 
than one protein, through alternative splicing or alternative post-translational 
modifications. Fourth, many proteins form complexes with other proteins or RNA 
molecules, and only function in the presence of these other molecules. Finally, protein 
degradation rate plays an important role in protein content. 

Methods of studying proteins 

Determining proteins which are post-translationally modified 

One way in which a particular protein can be studied is to develop an antibody which is 
specific to that modification. For example, there are antibodies which only recognize 
certain proteins when they are tyrosine-phosphorylated; also, there are antibodies specific 
to other modifications. These can be used to determine the set of proteins that have 
undergone the modification of interest. 

For sugar modifications, such as glycosylation of proteins, certain lectins have been 
discovered which bind sugars. These too can be used. 

A more common way to determine post-translational modification of interest is to subject a 
complex mixture of proteins to electrophoresis in "two-dimensions", which simply means 
that the proteins are electrophoresed first in one direction, and then in another... this 
allows small differences in a protein to be visualized by separating a modified protein from 
its unmodified form. This methodology is known as "two-dimensional gel electrophoresis". 

Recently, another approach has been developed called PROTOMAP which combines 
SDS-PAGE with shotgun proteomics to enable detection of changes in gel-migration such as 
those caused by proteolysis or post translational modification. 
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Determining the existence of proteins in complex mixtures 

Classically, antibodies to particular proteins or to their modified forms have been used in 
biochemistry and cell biology studies. These are among the most common tools used by 
practicing biologists today. 

For more quantitative determinations of protein amounts, techniques such as ELISAs can 
be used. 

For proteomic study, more recent techniques such as Matrix-assisted laser 
desorption/ionization have been employed for rapid determination of proteins in particular 
mixtures. 

Establishing protein-protein interactions 

Most proteins function in collaboration with other proteins, and one goal of proteomics is to 
identify which proteins interact. This is especially useful in determining potential partners 
in cell signaling cascades. 

Several methods are available to probe protein-protein interactions. The traditional method 
is yeast two-hybrid analysis. New methods include protein microarrays, immunoaffinity 
chromatography followed by mass spectrometry, and experimental methods such as phage 
display and computational methods. 

Practical applications of proteomics 

One of the most promising developments to come from the study of human genes and 
proteins has been the identification of potential new drugs for the treatment of disease. 
This relies on genome and proteome information to identify proteins associated with a 
disease, which computer software can then use as targets for new drugs. For example, if a 
certain protein is implicated in a disease, its 3D structure provides the information to 
design drugs to interfere with the action of the protein. A molecule that fits the active site 
of an enzyme, but cannot be released by the enzyme, will inactivate the enzyme. This is the 
basis of new drug-discovery tools, which aim to find new drugs to inactivate proteins 
involved in disease. As genetic differences among individuals are found, researchers expect 
to use these techniques to develop personalized drugs that are more effective for the 
individual. 

A computer technique which attempts to fit millions of small molecules to the 
three-dimensional structure of a protein is called "virtual ligand screening". The computer 
rates the quality of the fit to various sites in the protein, with the goal of either enhancing 
or disabling the function of the protein, depending on its function in the cell. A good 
example of this is the identification of new drugs to target and inactivate the HIV-1 
protease. The HIV-1 protease is an enzyme that cleaves a very large HIV protein into 
smaller, functional proteins. The virus cannot survive without this enzyme; therefore, it is 
one of the most effective protein targets for killing HIV. 
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Biomarkers 

Understanding the proteome, the structure and function of each protein and the 
complexities of protein-protein interactions will be critical for developing the most effective 
diagnostic techniques and disease treatments in the future. 

An interesting use of proteomics is using specific protein biomarkers to diagnose disease. A 
number of techniques allow to test for proteins produced during a particular disease, which 
helps to diagnose the disease quickly. Techniques include western blot, 
immunohistochemical staining, enzyme linked immunosorbent assay (ELISA) or mass 
spectrometry. The following are some of the diseases that have characteristic biomarkers 
that physicians can use for diagnosis. 

Alzheimer's disease 

In Alzheimer's disease, elevations in beta secretase create amyloid/beta-protein, which 
causes plaque to build up in the patient's brain, which is thought to play a role in dementia. 
Targeting this enzyme decreases the amyloid/beta-protein and so slows the progression of 
the disease. A procedure to test for the increase in amyloid/beta-protein is 
immunohistochemical staining, in which antibodies bind to specific antigens or biological 
tissue of amyloid/beta-protein. 

Heart disease 

Heart disease is commonly assessed using several key protein based biomarkers. Standard 
protein biomarkers for CVD include interleukin-6, interleukin-8, serum amyloid A protein, 
fibrinogen, and troponins. cTnl cardiac troponin I increases in concentration within 3 to 12 
hours of initial cardiac injury and can be found elevated days after an acute myocardial 
infarction. A number of commercial antibody based assays as well as other methods are 
used in hospitals as primary tests for acute MI. 

See also 
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Protein databases 

• UniProt 

• Protein Information Resource (PIR) 

• Swiss-Prot 

• Protein Data Bank (PDB) 

• National Center for Biotechnology Information (NCBI) 

• Human Protein Reference Database 

• Proteopedia The collaborative, 3D encyclopedia of proteins and other molecules. 
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Protein-protein interaction 

Protein-protein interactions involve not only the direct-contact association of protein 
molecules but also longer range interactions through the electrolyte, aqueous solution 
medium surrounding neighbor hydrated proteins over distances from less than one 
nanometer to distances of several tens of nanometers. Furthermore, such protein-protein 
interactions are thermodynamically linked functions of dynamically bound ions and water 
that exchange rapidly with the surrounding solution by comparison with the molecular 
tumbling rate (or correlation times) of the interacting proteins. Protein associations are also 
studied from the perspectives of biochemistry, quantum chemistry, molecular dynamics, 
signal transduction and other metabolic or genetic/epigenetic networks. Indeed, 
protein-protein interactions are at the core of the entire -» Interactomics system of any 
living cell. 

The interactions between proteins are important for very numerous— if not all— biological 
functions. For example, signals from the exterior of a cell are mediated to the inside of that 
cell by protein-protein interactions of the signaling molecules. This process, called signal 
transduction, plays a fundamental role in many biological processes and in many diseases 
(e.g. cancers). Proteins might interact for a long time to form part of a protein complex, a 
protein may be carrying another protein (for example, from cytoplasm to nucleus or vice 
versa in the case of the nuclear pore importins), or a protein may interact briefly with 
another protein just to modify it (for example, a protein kinase will add a phosphate to a 
target protein). This modification of proteins can itself change protein-protein interactions. 
For example, some proteins with SH2 domains only bind to other proteins when they are 
phosphorylated on the amino acid tyrosine while bromodomains specifically recognise 
acetylated lysines. In conclusion, protein-protein interactions are of central importance for 
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virtually every process in a living cell. Information about these interactions improves our 
understanding of diseases and can provide the basis for new therapeutic approaches. 

Methods to investigate protein-protein interactions 

Biochemical methods 

As protein-protein interactions are so important there are a multitude of methods to detect 
them. Each of the approaches has its own strengths and weaknesses, especially with regard 
to the sensitivity and specificity of the method. A high sensitivity means that many of the 
interactions that occur in reality are detected by the screen. A high specificity indicates 
that most of the interactions detected by the screen are also occurring in reality. 

• Co-immunoprecipitation is considered to be the gold standard assay for protein-protein 
interactions, especially when it is performed with endogenous (not overexpressed and 
not tagged) proteins. The protein of interest is isolated with a specific antibody. 
Interaction partners which stick to this protein are subsequently identified by western 
blotting. Interactions detected by this approach are considered to be real. However, this 
method can only verify interactions between suspected interaction partners. Thus, it is 
not a screening approach. A note of caution also is that immunoprecipitation experiments 
reveal direct and indirect interactions. Thus, positive results may indicate that two 
proteins interact directly or may interact via a bridging protein. 

• Bimolecular Fluorescence Complementation (BiFC) is a new technique in observing the 
interactions of proteins. Combining with other new techniques, this method can be used 
to screen protein-protein interactions and their modulators . 

• Affinity electrophoresis as used for estimation of binding constants, as for instance in 
lectin affinity electrophoresis or characterization of molecules with specific features like 
glycan content or ligand binding. 

• Pull-down assays are a common variation of immunoprecipitation and 
immunoelectrophoresis and are used identically, although this approach is more 
amenable to an initial screen for interacting proteins. 

• Label transfer can be used for screening or confirmation of protein interactions and can 
provide information about the interface where the interaction takes place. Label transfer 
can also detect weak or transient interactions that are difficult to capture using other in 
vitro detection strategies. In a label transfer reaction, a known protein is tagged with a 
detectable label. The label is then passed to an interacting protein, which can then be 
identified by the presence of the label. 

• The yeast two-hybrid screen investigates the interaction between artificial fusion 
proteins inside the nucleus of yeast. This approach can identify binding partners of a 
protein in an unbiased manner. However, the method has a notorious high false-positive 
rate which makes it necessary to verify the identified interactions by 
co-immunoprecipitation. 

• In-vivo crosslinking of protein complexes using photo-reactive amino acid analogs was 
introduced in 2005 by researchers from the Max Planck Institute In this method, cells 
are grown with photoreactive diazirine analogs to leucine and methionine, which are 
incorporated into proteins. Upon exposure to ultraviolet light, the diazirines are activated 
and bind to interacting proteins that are within a few angstroms of the photo-reactive 
amino acid analog. 
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• Tandem affinity purification (TAP) method allows high throughput identification of 
protein interactions. In contrast to Y2H approach accuracy of the method can be 
compared to those of small-scale experiments (Collins et al., 2007) and the interactions 
are detected within the correct cellular environment as by co-immunoprecipitation. 
However, the TAP tag method requires two successive steps of protein purification and 
consequently it can not readily detect transient protein-protein interactions. Recent 
genome-wide TAP experiments were performed by Krogan et al., 2006 and Gavin et al., 
2006 providing updated protein interaction data for yeast organism. 

• Chemical crosslinking is often used to "fix" protein interactions in place before trying to 
isolate/identify interacting proteins. Common crosslinkers for this application include the 
non-cleavable NHS-ester crosslinker, bz's-sulfosuccinimidyl suberate (BS3); a cleavable 
version of BS3, dithiobis(sulfosuccinimidyl propionate) (DTSSP); and the imidoester 
crosslinker dimethyl dithiobispropionimidate (DTBP) that is popular for fixing 
interactions in ChIP assays. 

• Chemical crosslinking followed by high mass MALDI mass spectrometry can be used to 
analyze intact protein interactions in place before trying to isolate/identify interacting 
proteins. This method detects interactions among non-tagged proteins and is available 
from CovalX. 

• SPINE (Strep-protein interaction experiment) uses a combination of reversible 
crosslinking with formaldehyde and an incorporation of an affinity tag to detect 
interaction partners in vivo. 

• Quantitative immunoprecipitation combined with knock-down (QUICK) relies on 
co-immunoprecipitation, quantitative mass spectrometry (SILAC) and RNA interference 
(RNAi). This method detects interactions among endogenous non-tagged proteins . 
Thus, it has the same high confidence as co-immunoprecipitation. However, this method 
also depends on the availability of suitable antibodies. 

Physical/Biophysical and Theoretical methods 

• Dual Polarisation Interferometry (DPI) can be used to measure protein-protein 
interactions. DPI provides real-time, high-resolution measurements of molecular size, 
density and mass. While tagging is not necessary, one of the protein species must be 
immobilized on the surface of a waveguide. 

• Static Light scattering (SLS) measures changes in the Rayleigh scattering of protein 
complexes in solution and can non-destructively characterize both weak and strong 
interactions without tagging or immobilization of the protein. The measurement consists 
of mixing a series of aliquots of different concentrations or compositions with the anylate, 
measuring the effect of the changes in light scattering as a result of the interaction, and 
fitting the correlated light scattering changes with concentration to a model. Weak, 
non-specific interactions are typically characterized via the second virial coefficient. This 
type of analysis can determine the equilibrium association constant for associated 
complexes. . Additional light scattering methods for protein activity determination 
were previously developed by Timasheff. More recent Dynamic Light scattering (DLS) 
methods for proteins were reported by H. Chou that are also applicable at high protein 
concentrations and in protein gels; DLS may thus also be applicable for in vivo 
cytoplasmic observations of various protein-protein interactions. 

• Surface plasmon resonance can be used to measure protein-protein interaction. 
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• With Fluorescence correlation spectroscopy, one protein is labeled with a fluorescent dye 
and the other is left unlabeled. The two proteins are then mixed and the data outputs the 
fraction of the labeled protein that is unbound and bound to the other protein, allowing 
you to get a measure of K and binding affinity. You can also take time-course 
measurements to characterize binding kinetics. FCS also tells you the size of the formed 
complexes so you can measure the stoichiometry of binding. A more powerful methods is 
[[fluorescence cross-correlation spectroscopy (FCCS) that employs double labeling 
techniques and cross-correlation resulting in vastly improved signal-to-noise ratios over 
FCS. Furthermore, the two-photon and three-photon excitation practically eliminates 
photobleaching effects and provide ultra-fast recording of FCCS or FCS data. 

• Fluorescence resonance energy transfer (FRET) is a common technique when observing 
the interactions of only two different proteins . 

• Protein activity determination by NMR multi-nuclear relaxation measurements, or 2D-FT 
NMR spectroscopy in solutions, combined with nonlinear regression analysis of NMR 
relaxation or 2D-FT spectroscopy data sets. Whereas the concept of water activity is 
widely known and utilized in the applied biosciences, its complement-the protein activity 
which quantitates protein-protein interactions- is much less familiar to bioscientists as it 
is more difficult to determine in dilute solutions of proteins; protein activity is also much 
harder to determine for concentrated protein solutions when protein aggregation, not 

ro I 

merely transient protein association, is often the dominant process . 

• Theoretical modeling of protein-protein interactions involves a detailed physical 
chemistry/thermodynamic understanding of several effects involved, such as 
intermolecular forces, ion-binding, proton fluctuations and proton exchange. The theory 
of thermodynamically linked functions is one such example in which ion-binding and 
protein-protein interactions are treated as linked processes; this treatment is especially 
important for proteins that have enzymatic activity which depends on cofactor ions 
dynamically bound at the enzyme active site, as for example, in the case of 
oxygen-evolving enzyme system (OES) in photosythetic biosystems where the oxygen 
molecule binding is linked to the chloride anion binding as well as the linked state 
transition of the manganese ions present at the active site in Photosystem II(PSII). 
Another example of thermodynamically linked functions of ions and protein activity is 
that of divalent calcium and magnesium cations to myosin in mechanical energy 
transduction in muscle. Last-but-not least, chloride ion and oxygen binding to hemoglobin 
(from several mammalian sources, including human) is a very well-known example of 
such thermodynamically linked functions for which a detailed and precise theory has 
been already developed. 

• Molecular dynamics (MD) computations of protein-protein interactions. 

• Protein-protein docking, the prediction of protein-protein interactions based only on the 
three-dimensional protein structures from X-ray diffraction of protein crystals might not 
be satisfactory. [9] [10] 
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Network visualization of protein-protein interactions 

Visualization of protein-protein interaction networks is a popular application of scientific 
visualization techniques. Although protein interaction diagrams are common in textbooks, 
diagrams of whole cell protein interaction networks were not as common since the level of 
complexity made them difficult to generate. One example of a manually produced molecular 
interaction map is Kurt Kohn's 1999 map of cell cycle control. Drawing on Kohn's map, 
in 2000 Schwikowski, Uetz, and Fields published a paper on protein-protein interactions in 
yeast, linking together 1,548 interacting proteins determined by two-hybrid testing. They 
used a force-directed (Sugiyama) graph drawing algorithm to automatically generate an 
image of their network. [12] [13] [14] . 

An experimental view of Kurt Kohn's 1999 map gmap . Image was merged via gimp 
2.2.17 and then uploaded to maplib.net 
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External links 

• National Center for Integrative Biomedical Informatics (NCIBI) (http://portal.ncibi.org/ 
gateway/) 

• Proteins and Enzymes (http://www.dmoz.org/Science/Biology/ 
Biochemistry_and_Molecular_Biology/Biomolecules/Proteins_and_Enzymes/) at the Open 
Directory Project 

• FLIM Applications (http://www.nikoninstruments.com/infocenter.php?n = FLIM) FLIM is 
also often used in microspectroscopic/ chemical imaging, or microscopic, studies to 
monitor spatial and temporal protein-protein interactions, properties of membranes and 
interactions with nucleic acids in living cells. 
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The Interactome 



Metabolic network 



A metabolic network is the complete set of metabolic and physical processes that 
determine the physiological and biochemical properties of a cell. As such, these networks 
comprise the chemical reactions of metabolism as well as the regulatory interactions that 
guide these reactions. 

With the sequencing of complete genomes, it is now possible to reconstruct the network of 
biochemical reactions in many organisms, from bacteria to human. Several of these 
networks are available online: Kyoto Encyclopedia of Genes and Genomes (KEGG)[1], 
EcoCyc [2] and BioCyc [3]. Metabolic networks are powerful tools, for studying and 
modelling metabolism. From the study of metabolic networks' topology with graph theory to 
predictive toxicology and ADME. 
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See also 

• -» Metabolic network modelling 

• -» Metabolic pathway 
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Metabolic network modelling 



Metabolic network reconstruction and 

simulation allows for an in depth insight 

into comprehending the molecular 

mechanisms of a particular organism, 

especially correlating the genome with 

molecular physiology (Francke, Siezen, and 

Teusink 2005). A reconstruction breaks down 

metabolism pathways into their respective 

reactions and enzymes, and analyzes them 

within the perspective of the entire network. 

Examples of various metabolic pathways 

include glycolysis, Krebs cycle, pentose 

phosphate pathway, etc. In simplified terms, 

a reconstruction involves collecting all of the 

relevant metabolic information of an 

organism and then compiling it in a way that 

makes sense for various types of analyses to 

be performed. The correlation between the 

genome and metabolism is made by 

searching gene databases, such as KEGG [1], 

GeneDB [2], etc., for particular genes by 

inputting enzyme or protein names. For example, a search can be conducted based on the 

protein name or the EC number (a number that represents the catalytic function of the 

enzyme of interest) in order to find the associated gene (Francke et al. 2005). 




Metabolic network showing interactions between 

enzymes and metabolites in the Arabidopsis 

thaliana citric acid cycle. Enzymes and metabolites 

are the red dots and interactions between them are 

the lines. 
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Metabolic Network Model for Escherichia coli. 



Beginning steps of a 
reconstruction 

Resources 

Below is more detailed description of a few 
gene/enzyme/reaction/pathway databases 
that are crucial to a metabolic 
reconstruction: 

• Kyoto Encyclopedia of Genes and 
Genomes (KEGG): This is a 
bioinformatics database containing 
information on genes, proteins, reactions, 
and pathways. The 'KEGG Organisms' 
section, which is divided into eukaryotes 
and prokaryotes, encompasses many 
organisms for which gene and -» DNA 
information can be searched by typing in 
the enzyme of choice. This resource can be 
extremely useful when building the 

association between metabolism enzymes, reactions and genes. 

• Gene DataBase (GeneDB): Similar to the KEGG resource, the Gene DataBase provides 
access to genomes of various organisms. If a search for hexokinase is carried out, genes 
for the organism of interest can be easily found. Moreover, the metabolic process 
associated with the enzyme is also listed along with the information on the genes (in the 
case of hexokinase, the pathway is glycolysis). Therefore, with one click, it is very easy to 
access all the different genes that are associated with glycolysis. Furthermore, GeneDB 
has a hierarchical organizational structure for metabolism, and it is possible to see at 
what level of the chain one is currently working on. This helps broaden an understanding 
of the biological and chemical processes that are involved in the organism. 

• BioCyc, EcoCyc and MetaCyc: BioCyc is a collection of over 200 pathway/genome 
databases, containing whole databases dedicated to certain organisms. For example, 
EcoCyc which falls under the giant umbrella of BioCyc, is a highly detailed -» 
bioinformatics database on the genome and metabolic reconstruction of Escherichia Coli, 
including thorough descriptions of the various signaling pathways. The EcoCyc database 
can serve as a paradigm and model for any reconstruction. Additionally, MetaCyc, an 
encyclopedia of metabolic pathways, contains a wealth of information on metabolic 
reactions derived from over 600 different organisms. 

• Pathway Tools [3]: This is a bioinformatics package that assists in the construction of 
pathway/genome databases such as EcoCyc (Francke et al. 2005). Developed by Peter 
Karp and associates at the SRI International Bioinformatics Group, Pathway Tools 
comprises several separate units that work together to generate new pathway/genome 
databases. First, PathoLogic takes an annotated genome for an organism and infers 
probable metabolic pathways to produce a new pathway/genome database. This can be 
followed by application of the Pathway Hole Filler, which predicts likely genes to fill 
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"holes" (missing steps) in predicted pathways. Afterward, the Pathway Tools Navigator 
and Editor functions let users visualize, analyze, access and update the database. Thus, 
using PathoLogic and encyclopedias like MetaCyc, an initial fast reconstruction can be 
developed automatically, and then using the other units of Pathway Tools, a very detailed 
manual update, curation and verification step can be carried out (SRI 2005). 

• ENZYME: This is an enzyme nomenclature database (part of the ExPASY [4] 
proteonomics server of the Swiss Institute of Bioinformatics). After searching for a 
particular enzyme on the database, this resource gives you the reaction that is catalyzed. 
Additionally, ENZYME has direct links to various other gene/enzyme/medical literature 
databases such as KEGG, BRENDA, PUBMED, and PUMA2 to name a few. 

• BRENDA: A comprehensive enzyme database, BRENDA, allows you to search for an 
enzyme by name or EC number. You can also search for an organism and find all the 
relevant enzyme information. Moreover, when an enzyme search is carried out, BRENDA 
provides a list of all organisms containing the particular enzyme of interest. 

• PUBMED: This is an online library developed by the National Center for Biotechnology 
Information, which contains a massive collection of medical journals. Using the link 
provided by ENZYME, the search can be directed towards the organism of interest, thus 
recovering literature on the enzyme and its use inside of the organism. 

Next steps of the reconstruction 

After the initial stages of the reconstruction, a systematic verification is made in order to 
make sure no inconsistencies are present and that all the entries listed are correct and 
accurate (Francke et ah 2005). Furthermore, previous literature can be researched in order 
to support any information obtained from one of the many metabolic reaction and genome 
databases. This provides an added level of assurance for the reconstruction that the enzyme 
and the reaction it catalyzes do actually occur in the organism. 

Any new reactions not present in the databases need to be added to the reconstruction. The 
presence or absence of certain reactions of the metabolism will affect the amount of 
reactants/products that are present for other reactions within the particular pathway. This 
is because products in one reaction go on to become the reactants for another reaction, i.e. 
products of one reaction can combine with other proteins or compounds to form new 
proteins/compounds in the presence of different enzymes or catalysts (Francke et ah 2005). 

Francke et ah (2005) provide an excellent example as to why the verification step of the 
project needs to be performed in significant detail. During a metabolic network 
reconstruction of Lactobacillus plantarum, the model showed that succinyl-CoA was one of 
the reactants for a reaction that was a part of the biosynthesis of methionine. However, an 
understanding of the physiology of the organism would have revealed that due to an 
incomplete tricarboxylic acid pathway, Lactobacillus plantarum does not actually produce 
succinyl-CoA, and the correct reactant for that part of the reaction was acetyl-CoA. 

Therefore, systematic verification of the initial reconstruction will bring to light several 
inconsistencies that can adversely affect the final interpretation of the reconstruction, 
which is to accurately comprehend the molecular mechanisms of the organism. 
Furthermore, the simulation step also ensures that all the reactions present in the 
reconstruction are properly balanced. To sum up, a reconstruction that is fully accurate can 
lead to greater insight about understanding the functioning of the organism of interest 
(Francke et ah 2005). 
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Advantages of a reconstruction 

• Several inconsistencies exist between gene, enzyme, and reaction databases and 
published literature sources regarding the metabolic information of an organism. A 
reconstruction is a systematic verification and compilation of data from various sources 
that takes into account all of the discrepancies. 

• A reconstruction combines the relevant metabolic and genomic information of an 
organism. 

• A reconstruction also allows for metabolic comparisons to be performed between various 
species of the same organism as well as between different organisms. 

Metabolic network simulation 

A metabolic network can be broken down into a stoichiometric matrix where the rows 
represent the compounds of the reactions, while the columns of the matrix correspond to 
the reactions themselves. Stoichiometry is a quantitative relationship between substrates of 
a chemical reaction (Merriam 2002). In order to deduce what the metabolic network 
suggests, recent research has centered on two approaches; namely extreme pathways and 
elementary mode analysis (Papin, Stelling, Price, Klamt, Schuster, and Palsson 2004). 

Extreme Pathways 

Price, Reed, Papin, Wiback and Palsson (2003) use a method of singular value 
decomposition (SVD) of extreme pathways in order to understand regulation of a human 
red blood cell metabolism. Extreme pathways are convex basis vectors that consist of 
steady state functions of a metabolic network (Papin, Price, and Palsson 2002). For any 
particular metabolic network, there is always a unique set of extreme pathways available 
(Papin et al. 2004). Furthermore, Price et al. (2003) define a constraint-based approach, 
where through the help of constraints like mass balance and maximum reaction rates, it is 
possible to develop a 'solution space' where all the feasible options fall within. Then, using 
a kinetic model approach, a single solution that falls within the extreme pathway solution 
space can be determined (Price et al. 2003). Therefore, in their study, Price et al. (2003) 
use both constraint and kinetic approaches to understand the human red blood cell 
metabolism. In conclusion, using extreme pathways, the regulatory mechanisms of a 
metabolic network can be studied in further detail. 

Elementary mode analysis 

Elementary mode analysis closely matches the approach used by extreme pathways. Similar 
to extreme pathways, there is always a unique set of elementary modes available for a 
particular metabolic network (Papin et al. 2004). These are the smallest sub-networks that 
allow a metabolic reconstruction network to function in steady state (Schuster, Fell, and 
Dandekar 2000; Stelling, Klamt, Bettenbrock, Schuster, and Gilles 2002). According to 
Shelling et al. (2002), elementary modes can be used to understand cellular objectives for 
the overall metabolic network. Furthermore, elementary mode analysis takes into account 
stoichiometrics and thermodynamics when evaluating whether a particular metabolic route 
or network is feasible and likely for a set of proteins/enzymes (Schuster et al. 2000). 
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Minimal metabolic behaviors (MMBs) 

Recently, Larhlimi and Bockmayr (2008) presented a new approach called "minimal 
metabolic behaviors" for the analysis of metabolic networks. Like elementary modes or 
extreme pathways, these are uniquely determined by the network, and yield a complete 
description of the flux cone. However, the new description is much more compact. In 
contrast with elementary modes and extreme pathways, which use an inner description 
based on generating vectors of the flux cone, MMBs are using an outer description of the 
flux cone. This approach is based on sets of non-negativity constraints. These can be 
identified with irreversible reactions, and thus have a direct biochemical interpretation. 
One can characterize a metabolic network by MMBs and the reversible metabolic space. 

Flux balance analysis 

A different technique to simulate the metabolic network is to perform flux balance analysis. 
This method uses linear programming, but in contrast to elementary mode analysis and 
extreme pathways, only a single solution results in the end. Linear programming is usually 
used to obtain the maximum potential of the objective function that you are looking at, and 
therefore, when using flux balance analysis, a single solution is found to the optimization 
problem (Stelling et al. 2002). In a flux balance analysis approach, exchange fluxes are 
assigned to those metabolites that enter or leave the particular network only. Those 
metabolites that are consumed within the network are not assigned any exchange flux 
value. Also, the exchange fluxes along with the enzymes can have constraints ranging from 
a negative to positive value (ex: -10 to 10). 

Furthermore, this particular approach can accurately define if the reaction stoichiometry is 
in line with predictions by providing fluxes for the balanced reactions. Also, flux balance 
analysis can highlight the most effective and efficient pathway through the network in 
order to achieve a particular objective function. In addition, gene knockout studies can be 
performed using flux balance analysis. The enzyme that correlates to the gene that needs to 
be removed is giving a constraint value of 0. Then, the reaction that the particular enzyme 
catalyzes is completely removed from the analysis. 

Conclusion 

In conclusion, metabolic network reconstruction and simulation can be effectively used to 
understand how an organism or parasite functions inside of the host cell. For example, if 
the parasite serves to compromise the immune system by lysing macrophages, then the 
goal of metabolic reconstruction/simulation would be to determine the metabolites that are 
essential to the organism's proliferation inside of macrophages. If the proliferation cycle is 
inhibited, then the parasite would not continue to evade the host's immune system. A 
reconstruction model serves as a first step to deciphering the complicated mechanisms 
surrounding disease. The next step would be to use the predictions and postulates 
generated from a reconstruction model and apply it to drug delivery and drug-engineering 
techniques. 

Currently, many tropical diseases affecting third world nations are very inadequately 
characterized, and thus poorly understood. Therefore, a metabolic reconstruction and 
simulation of the parasites that cause the tropical diseases would aid in developing new and 
innovative cures and treatments. 
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See also 

• -» Metabolic network 

• Computer simulation 

• Computational systems biology 

• -» Metabolic pathway 

• Metagenomics 

• Metabolic control analysis 
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External links 

GeneDB [6] 

KEGG [7] 

PathCase Case Western Reserve University 

BRENDA [9] 

BioCyc and Cyclone - provides an open source Java API to the pathway tool 

BioCyc to extract Metabolic graphs. 

EcoCyc [12] 

MetaCyc [13] 



ENZYME [14] 

SBRI Bioinformatics Tools and Software 

TIGR [16] 
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• Pathway Tools 

ri8i 

• Stanford Genomic Resources 

• Pathway Hunter Tool [19] 

• IMG The Integrated Microbial Genomes system, for genome analysis by the DOE-JGI. 

• Systems Analysis, Modelling and Prediction Group at the University of Oxford, 
Biochemical reaction pathway inference techniques. 
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Metabolic pathway 

In biochemistry, a metabolic pathway is a series of chemical reactions occurring within a 
cell. In each pathway, a principal chemical is modified by chemical reactions. Enzymes 
catalyze these reactions, and often require dietary minerals, vitamins, and other cofactors 
in order to function properly. Because of the many chemicals that may be involved, 
pathways can be quite elaborate. In addition, many pathways can exist within a cell. This 
collection of pathways is called the -» metabolic network. Pathways are important to the 
maintenance of homeostasis within an organism. 

Metabolism is a step-by-step modification of the initial molecule to shape it into another 
product. The result can be used in one of three ways: 

• To be stored by the cell 

• To be used immediately, as a metabolic product 

• To initiate another metabolic pathway, called a flux generating step. 

A molecule called a substrate enters a metabolic pathway depending on the needs of the 
cell and the availability of the substrate. An increase in concentration of anabolic and 
catabolic end-products would slow the metabolic rate for that particular pathway. 

Overview 

Each metabolic pathway is composed of a series of biochemical reactions that are 
connected by their intermediates: The reactants (or substrates) of one reaction are the 
products of the previous one, and so on. Metabolic pathways are usually considered in one 
direction (although all reactions are chemically reversible, conditions in the cell are such 
that it is thermodynamically more favorable for flux to be in one of the directions). 

• Glycolysis was the first metabolic pathway discovered: 

1. As glucose enters a cell, it is immediately phosphorylated by ATP to glucose 
6-phosphate in the irreversible first step. This is to prevent the glucose from leaving 
the cell. 

2. In times of excess lipid or protein energy sources, glycolysis may run in reverse 
(gluconeogenesis) in order to produce glucose 6-phosphate for storage as glycogen or 
starch. 

• Metabolic pathways are often regulated by feedback inhibition, or by a cycle wherein one 
of the products in the cycle starts the reaction again, such as the Krebs Cycle (see 
below). 

• Anabolic and catabolic pathways in eukaryotes are separated either by compartmentation 
or by the use of different enzymes and cofactors. 
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Major metabolic pathways 




Glucuronate metabolism 

Pentose interconversion 

Inositol metabolism 

Cellulose and sucrose 
metabolism 

Starch and glycogen 
metabolism 

Other sugar 
metabolism 

Pentose phosphate pathway 

Glycolysis and Gluconeogenesis 

Amino sugars metabolism 

Small amino acid synthesis 

Branched amino acid 
synthesis 

Purine biosynthesis 

Histidine metabolism 

Aromatic amino 
acid synthesis 

Pyruvate 
decarboxylation 
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Fermentation 

Fatty acid 
metabolism 

Urea cycle 

Aspartate amino acid 
group synthesis 

Porphyrins and 

corrinoids 

metabolism 

Citric acid cycle 

Glutamate amino 

acid group 

synthesis 

Pyrimidine biosynthesis 

w All pathway labels on this image are links, simply click to access the article. 

A high resolution labeled version of this image is available 

here. 

Cellular respiration 

Several distinct but linked metabolic pathways are used by cells to transfer the energy 
released by breakdown of fuel molecules to ATP. These occur within all living organisms in 
some forms: 

1. Glycolysis 

2. Anaerobic respiration 

3. Krebs cycle / Citric acid cycle 

4. Oxidative phosphorylation 

Other pathways occurring in (most or) all living organisms include: 

• Fatty acid oxidation ((3-oxidation) 

• Gluconeogenesis 

• HMG-CoA reductase pathway (isoprene prenylation chains, see cholesterol) 

• Pentose phosphate pathway (hexose monophosphate shunt) 

• Porphyrin synthesis (or heme synthesis) pathway 

• Urea cycle 

Creation of energetic compounds from non-living matter: 

• Photosynthesis (plants, algae, cyanobacteria) 

• Chemosynthesis (some bacteria) 
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See also 

• Metabolism 

• -» Metabolic network 

• -» Metabolic network modelling 

External links 

• BioCyc: Metabolic network models for hundreds of organisms 

• KEGG: Kyoto Encyclopedia of Genes and Genomes ' ' 

• MetaCyc: A database of nonredundant, experimentally elucidated metabolic pathways 
(900+ pathways from more than 800 different organisms). 

• Metabolism, Cellular Respiration and Photosynthesis - The Virtual Library of 
Biochemistry and Cell Biology 

• PathCase Pathways Database System 

• Interactive Flow Chart of the Major Metabolic Pathways 

• A novel visualization for a Metabolic Pathway 

• DAVID: Visualize genes on pathway maps 

T91 

• Wikipathways: pathways for the people 

• ConsensusPathDB [10] 
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Interaction network 

Interaction network is a network of nodes that are connected by features. If the feature is 
a physical and molecular, the interaction network is molecular interactions usually found in 
cells. Interaction network has become a research topic in biology in recent years due to 
rapid progress in high throughput data production. 

See also 

• Protein protein interaction 

• [[Interac 

External links 

• Interactomics.org : Biological interaction research information site. 

• BIND database Canada [2] 

• VirHostNet - Virus-Host protein-protein interaction Networks knowledgebase 
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Interactomics 



Interactomics is a discipline at the intersection of -» bioinformatics and biology that deals 
with studying both the interactions and the consequences of those interactions between 
and among proteins, and other molecules within a cell . The network of all such 
interactions is called the Interactome. Interactomics thus aims to compare such networks of 
interactions (i.e., interactomes) between and within species in order to find how the traits 
of such networks are either preserved or varied. From a mathematical, or -» mathematical 
biology viewpoint an interactome network is a graph or a category representing the most 
important interactions pertinent to the normal physiological functions of a cell or organism. 

Interactomics is an example of "top-down" systems biology, which takes an overhead, as 
well as overall, view of a biosystem or organism. Large sets of genome-wide and proteomic 
data are collected, and correlations between different molecules are inferred. From the 
data new hypotheses are formulated about feedbacks between these molecules. These 
hypotheses can then be tested by new experiments . 

Through the study of the interaction of all of the molecules in a cell the field looks to gain a 
deeper understanding of genome function and evolution than just examining an individual 
genome in isolation . Interactomics goes beyond cellular -* proteomics in that it not only 
attempts to characterize the interaction between proteins, but between all molecules in the 
cell. 
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Methods of interactomics 

The study of the interactome requires the collection of large amounts of data by way of high 
throughput experiments. Through these experiments a large number of data points are 
collected from a single organism under a small number of perturbations These 
experiments include: 

• Two-hybrid screening 

• Tandem Affinity Purification 

• X-ray tomography 

• Optical fluorescence microscopy 



Recent developments 

The field of interactomics is currently rapidly expanding and developing. While no 
biological interactomes have been fully characterized. Over 90% of proteins in 
Saccharomyces cerevisiae have been screened and their interactions characterized, making 
it the first interactome to be nearly fully specified 
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Also there have been recent systematic attempts to explore the human interactome 
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Metabolic Network Model for Escherichia coli. 



Other species whose interactomes have been studied in some detail include Caenorhabditis 
elegans and Drosophila melanogaster. 
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Criticisms and concerns 

Kiemer and Cesareni raise the following concerns with the current state of the field: 

• The experimental procedures associated with the field are error prone leading to "noisy 
results". This leads to 30% of all reported interactions being artifacts. In fact, two groups 
using the same techniques on the same organism found less than 30% interactions in 
common. 

• Techniques may be biased, i.e. the technique determines which interactions are found. 

• Ineractomes are not nearly complete with perhaps the exception of S. cerivisiae. 

• While genomes are stable, interactomes may vary between tissues and developmental 
stages. 

• Genomics compares amino acids, and nucleotides which are in a sense unchangeable, but 
interactomics compares proteins and other molecules which are subject to mutation and 
evolution. 

• It is difficult to match evolutionarily related proteins in distantly related species. 

See also 

• -» Interaction network 

• -» Proteomics 

• -» Metabolic network 

• -» Metabolic network modelling 

• -» Metabolic pathway 

• -» Genomics 

• -» Mathematical biology 

• -» Systems biology 
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External links 

• Interactomics.org (http://interactomics.org). A dedicated interactomics web site 
operated under BioLicense. 

• Interactome.org (http://interactome.org). An interactome wiki site. 

• PSIbase (http://psibase.kobic.re.kr) Structural Interactome Map of all Proteins. 

• Omics.org (http://omics.org). An omics portal site that is openfree (under BioLicense) 

• Genomics.org (http://genomics.org). A Genomics wiki site. 

• Comparative Interactomics analysis of protein family interaction networks using PSIMAP 
(protein structural interactome map) (http://bioinformatics.oxfordjournals.org/cgi/ 
content/full/2 1/15/3234) 

• Interaction interfaces in proteins via the Voronoi diagram of atoms (http://www. 
sciencedirect.com/science? ob=ArticleURL& udi=B6TYR-4KXVD30-2& user=10& 
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_coverDate=ll/30/2006&_rdoc=l&_fmt=&_orig=search&_sort=d&view=c& 
_acct=C000050221&_version=l&_urlVersion = 0&_userid = 10& 
md5=8361bf3fe7834b4642cdda3b979de8bb) 

• Using convex hulls to extract interaction interfaces from known structures. Panos Dafas, 
Dan Bolser, Jacek Gomoluch, Jong Park, and Michael Schroeder. Bioinformatics 2004 20: 
1486-1490. 

• PSIbase: a database of Protein Structural Interactome map (PSIMAP). Sungsam Gong, 
Giseok Yoon, Insoo Jang Bioinformatics 2005. 

• Mapping Protein Family Interactions : Intramolecular and Intermolecular Protein Family 
Interaction Repertoires in the PDB and Yeast, Jong Park, Michael Lappe & Sarah A. 
TeichmannJ.M.B (2001). 

• Semantic Systems Biology (http://www.semantic-systems-biology.org) 
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Related fields 



Mathematical biology 



Mathematical biology is also called theoretical biology, and sometimes 
biomathematics. It includes at least four major subfields: biological mathematical 
modeling, relational biology/complex systems biology (CSB), bioinformatics and 
computational biomodeling/biocomputing. It is an interdisciplinary academic research field 
with a wide range of applications in biology, medicine and -» biotechnology. ' 

Mathematical biology aims at the mathematical representation, treatment and modeling of 
biological processes, using a variety of applied mathematical techniques and tools. It has 
both theoretical and practical applications in biological, biomedical and biotechnology 
research. For example, in cell biology, protein interactions are often represented as 
"cartoon" models, which, although easy to visualize, do not accurately describe the systems 
studied. In order to do this, precise mathematical models are required. By describing the 
systems in a quantitative manner, their behavior can be better simulated, and hence 
properties can be predicted that might not be evident to the experimenter. 

Importance 

Applying mathematics to biology has a long history, but only recently has there been an 
explosion of interest in the field. Some reasons for this include: 

• the explosion of data-rich information sets, due to the -» genomics revolution, which are 
difficult to understand without the use of analytical tools, 

• recent development of mathematical tools such as chaos theory to help understand 
complex, nonlinear mechanisms in biology, 

• an increase in computing power which enables calculations and simulations to be 
performed that were not previously possible, and 

• an increasing interest in in silico experimentation due to ethical considerations, risk, 
unreliability and other complications involved in human and animal research. 

For use of basic arithmetics in biology, see relevant topic, such as Serial dilution. 

Areas of research 

Several areas of specialized research in mathematical and theoretical biology 

rq] 

as well as external links to related projects in various universities are concisely 
presented in the following subsections, including also a large number of appropriate 
validating references from a list of several thousands of published authors contributing to 
this field. Many of the included examples are characterised by highly complex, nonlinear, 
and supercomplex mechanisms, as it is being increasingly recognised that the result of such 
interactions may only be understood through a combination of mathematical, logical, 
physical/chemical, molecular and computational models. Due to the wide diversity of 
specific knowledge involved, biomathematical research is often done in collaboration 
between mathematicians, biomathematicians, theoretical biologists, physicists, 
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biophysicists, biochemists, bioengineers, engineers, biologists, physiologists, research 
physicians, biomedical researchers, oncologists, molecular biologists, geneticists, 
embryologists, zoologists, chemists, etc. 

Computer models and automata theory 

A monograph on this topic summarizes an extensive amount of published research in this 
area up to 1987, including subsections in the following areas: computer modeling in 
biology and medicine, arterial system models, neuron models, biochemical and oscillation 
networks, quantum automata, ] quantum computers in molecular biology and genetics, 

cancer modelling, neural nets, genetic networks, abstract relational biology, 

ri2i ri3i 

metabolic-replication systems, category theory applications in biology and medicine, 

automata theory,cellular automata, tessallation models and complete 

self-reproduction , chaotic systems in organisms, relational biology and organismic 

theories. This published report also includes 390 references to peer-reviewed 

articles by a large number of authors. 

Modeling cell and molecular biology 

[221 

This area has received a boost due to the growing importance of molecular biology. 
Mechanics of biological tissues 
Theoretical enzymology and enzyme kinetics 
Cancer modelling and simulation 
Modelling the movement of interacting cell populations 

T271 

Mathematical modelling of scar tissue formation 
Mathematical modelling of intracellular dynamics 

T291 

Mathematical modelling of the cell cycle 



Modelling physiological systems 

• Modelling of arterial disease 

T311 

• Multi-scale modelling of the heart 



Molecular set theory 

Molecular set theory was introduced by Anthony Bartholomay, and its applications were 

T321 

developed in mathematical biology and especially in Mathematical Medicine. Molecular 
set theory (MST) is a mathematical formulation of the wide-sense chemical kinetics of 
biomolecular reactions in terms of sets of molecules and their chemical transformations 
represented by set-theoretical mappings between molecular sets. In a more general sense, 
MST is the theory of molecular categories defined as categories of molecular sets and their 
chemical transformations represented as set-theoretical mappings of molecular sets. The 
theory has also contributed to biostatistics and the formulation of clinical biochemistry 
problems in mathematical formulations of pathological, biochemical changes of interest to 
Physiology, Clinical Biochemistry and Medicine. 
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Population dynamics 

Population dynamics has traditionally been the dominant field of mathematical biology. 
Work in this area dates back to the 19th century. The Lotka-Volterra predator-prey 
equations are a famous example. In the past 30 years, population dynamics has been 
complemented by evolutionary game theory, developed first by John Maynard Smith. Under 
these dynamics, evolutionary biology concepts may take a deterministic mathematical form. 
Population dynamics overlap with another active area of research in mathematical biology: 
mathematical epidemiology, the study of infectious disease affecting populations. Various 
models of viral spread have been proposed and analyzed, and provide important results that 
may be applied to health policy decisions. 

Mathematical methods 

A model of a biological system is converted into a system of equations, although the word 
'model' is often used synonymously with the system of corresponding equations. The 
solution of the equations, by either analytical or numerical means, describes how the 
biological system behaves either over time or at equilibrium. There are many different 
types of equations and the type of behavior that can occur is dependent on both the model 
and the equations used. The model often makes assumptions about the system. The 
equations may also make assumptions about the nature of what may occur. 

Mathematical biophysics 

The earlier stages of mathematical biology were dominated by mathematical biophysics, 
described as the application of mathematics in biophysics, often involving specific 
physical/mathematical models of biosystems and their components or compartments. 

The following is a list of mathematical descriptions and their assumptions. 

Deterministic processes (dynamical systems) 

A fixed mapping between an initial state and a final state. Starting from an initial condition 
and moving forward in time, a deterministic process will always generate the same 
trajectory and no two trajectories cross in state space. 

• Difference equations - discrete time, continuous state space. 

• Ordinary differential equations - continuous time, continuous state space, no spatial 
derivatives. See also: Numerical ordinary differential equations. 

• Partial differential equations - continuous time, continuous state space, spatial 
derivatives. See also: Numerical partial differential equations. 

• Maps - discrete time, continuous state space. 

Stochastic processes (random dynamical systems) 

A random mapping between an initial state and a final state, making the state of the system 
a random variable with a corresponding probability distribution. 

• Non-Markovian processes - generalized master equation - continuous time with memory 
of past events, discrete state space, waiting times of events (or transitions between 
states) discretely occur and have a generalized probability distribution. 

• Jump Markov process - master equation - continuous time with no memory of past 
events, discrete state space, waiting times between events discretely occur and are 
exponentially distributed. See also: Monte Carlo method for numerical simulation 
methods, specifically continuous-time Monte Carlo which is also called kinetic Monte 
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Carlo or the stochastic simulation algorithm. 

• Continuous Markov process - stochastic differential equations or a Fokker-Planck 
equation - continuous time, continuous state space, events occur continuously according 
to a random Wiener process. 

Spatial modelling 

One classic work in this area is Alan Turing's paper on morphogenesis entitled The 
Chemical Basis of Morphogenesis, published in 1952 in the Philosophical Transactions of 
the Royal Society. 

• Travelling waves in a wound-healing assay 



Swarming behaviour ] 
A mechanochemical thee 
Biological pattern formation^ ] 



• A mechanochemical theory of morphogenesis 



T391 

• Spatial distribution modeling using plot samples 

Phylogenetics 

Phylogenetics is an area of mathematical biology that deals with the reconstruction and 
analysis of phylogenetic (evolutionary) trees and networks based on inherited 
characteristics. The main mathematical concepts are trees, X-trees and maximum 
parsimony trees. 

Model example: the cell cycle 

The eukaryotic cell cycle is very complex and is one of the most studied topics, since its 
misregulation leads to cancers. It is possibly a good example of a mathematical model as it 
deals with simple calculus but gives valid results. Two research groups have 

produced several models of the cell cycle simulating several organisms. They have recently 
produced a generic eukaryotic cell cycle model which can represent a particular eukaryote 
depending on the values of the parameters, demonstrating that the idiosyncrasies of the 
individual cell cycles are due to different protein concentrations and affinities, while the 
underlying mechanisms are conserved (Csikasz-Nagy et al., 2006). 

By means of a system of ordinary differential equations these models show the change in 
time (dynamical system) of the protein inside a single typical cell; this type of model is 
called a deterministic process (whereas a model describing a statistical distribution of 
protein concentrations in a population of cells is called a stochastic process). 
To obtain these equations an iterative series of steps must be done: first the several models 
and observations are combined to form a consensus diagram and the appropriate kinetic 
laws are chosen to write the differential equations, such as rate kinetics for stoichiometric 
reactions, Michaelis-Menten kinetics for enzyme substrate reactions and 
Goldbeter-Koshland kinetics for ultrasensitive transcription factors, afterwards the 
parameters of the equations (rate constants, enzyme efficiency coefficients and Michealis 
constants) must be fitted to match observations; when they cannot be fitted the kinetic 
equation is revised and when that is not possible the wiring diagram is modified. The 
parameters are fitted and validated using observations of both wild type and mutants, such 
as protein half-life and cell size. 

In order to fit the parameters the differential equations need to be studied. This can be 
done either by simulation or by analysis. 
In a simulation, given a starting vector (list of the values of the variables), the progression 



Mathematical biology 



86 



BIFURCATION DIAGRAM 

Fixed Points 



of the system is calculated by solving the equations at each time-frame in small increments 

In analysis, the proprieties of 
the equations are used to 
investigate the behavior of the 
system depending of the 
values of the parameters and 
variables. A system of 
differential equations can be 
represented as a vector field, 
where each vector described 
the change (in concentration 
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of two or more protein) 
determining where and how 

fast the trajectory (simulation) is heading. Vector fields can have several special points: a 
stable point, called a sink, that attracts in all directions (forcing the concentrations to be at 
a certain value), an unstable point, either a source or a saddle point which repels (forcing 
the concentrations to change away from a certain value), and a limit cycle, a closed 
trajectory towards which several trajectories spiral towards (making the concentrations 
oscillate). 

A better representation which can handle the large number of variables and parameters is 
called a bifurcation diagram(Bifurcation theory): the presence of these special steady-state 
points at certain values of a parameter (e.g. mass) is represented by a point and once the 
parameter passes a certain value, a qualitative change occurs, called a bifurcation, in which 
the nature of the space changes, with profound consequences for the protein 
concentrations: the cell cycle has phases (partially corresponding to Gl and G2) in which 
mass, via a stable point, controls cyclin levels, and phases (S and M phases) in which the 
concentrations change independently, but once the phase has changed at a bifurcation 
event (Cell cycle checkpoint), the system cannot go back to the previous levels since at the 
current mass the vector field is profoundly different and the mass cannot be reversed back 
through the bifurcation event, making a checkpoint irreversible. In particular the S and M 
checkpoints are regulated by means of special bifurcations called a Hopf bifurcation and an 
infinite period bifurcation. 



Mathematical/theoretical biologists 

Pere Alberch 
Anthony F. Bartholomay 
J. T. Bonner 
Jack Cowan 
Gerd B. Miiller 
Walter M. Elsasser 
Claus Emmeche 
Andree Ehresmann 
Marc Feldman 
Ronald A. Fisher 
Brian Goodwin 
Bryan Grenfell 
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J. B. S. Haldane 

William D. Hamilton 

Lionel G. Harrison 

Michael Hassell 

Sven Erik Jorgensen 

George Karreman 

Stuart Kauffman 

Kalevi Kull 

Herbert D. Landahl 

Richard Lewontin 

Humberto Maturana 

Robert May 

John Maynard Smith 

Howard Pattee 

George R. Price 

Erik Rauch 

Nicolas Rashevsky 

Ronald Brown (mathematician) 

Johannes Reinke 

Robert Rosen 

Rene Thorn 

Jakob von Uexkiill 

Robert Ulanowicz 

Francisco Varela 

C. H. Waddington 

Arthur Winfree 

Lewis Wolpert 

Sewall Wright 

Christopher Zeeman 

Mathematical, theoretical and computational biophysicists 

Nicolas Rashevsky 
Ludwig von Bertalanffy 
Francis Crick 
Manfred Eigen 
Walter Elsasser 
Herbert Frohlich, FRS 
Francois Jacob 
Martin Karplus 
George Karreman 
Herbert D. Landahl 
Ilya, Viscount Prigogine 
Sirjohn Randall 
James D. Murray 
Bernard Pullman 
Alberte Pullman 
Erwin Schrodinger 
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• Klaus Schulten 

• Peter Schuster 

• Zeno Simon 

• D'Arcy Thompson 

• Murray Gell-Mann 

See also 

Abstract relational biology [42][43] [44] 

Biocybernetics 

-» Bioinformatics 

Biologically-inspired computing 

Biostatistics 

Cellular automata 

Coalescent theory 

-» Complex systems biology 

Computational biology 

Dynamical systems in biology [49] [50] [51] [52] [53] [54] 

Epidemiology 

Evolution theories and Population Genetics 

• Population genetics models 

• Molecular evolution theories 
Ewens's sampling formula 
Excitable medium 
Mathematical models 

• Molecular modelling 

• Software for molecular modeling 

• Metabolic-replication systems 

• Models of Growth and Form 

• Neighbour-sensing model 
Morphometries 

Organismic systems (OS) [57][58] 
Organismic supercategories 
Population dynamics of fisheries 
Protein folding, also blue Gene and folding@home 
Quantum computers 
Quantum genetics 
Relational biology 

-» Self-reproduction (also called self- replication in a more general context). 
Computational gene models 
-» Systems biology 
Theoretical biology 
Topological models of morphogenesis 

• DNA topology 

• DNA sequencing theory 

For use of basic arithmetics in biology, see relevant topic, such as Serial dilution. 
Biographies 
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Charles Darwin 
D'Arcy Thompson 
Joseph Fourier 
Charles S. Peskin 
Nicolas Rashevsky [66] 
Robert Rosen 
Rosalind Franklin 
Francis Crick 
Rene Thorn 
Vito Volte rra 
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External links 

• Theoretical and mathematical biology website (http://www.kli.ac.at/theorylab/index. 
html) 

• Complexity Discussion Group (http://www.complex.vcu.edu/) 

• Integrative cancer biology modeling and Complex systems biology (http://fs512.fshn. 
uiuc.edu/ComplexSystemsBiology.htm) 

• UCLA Biocybernetics Laboratory (http://biocyb.cs.ucla.edu/research.html) 

• TUCS Computational Biomodelling Laboratory (http://www.tucs.fi/research/labs/ 
combio.php) 

• Nagoya University Division of Biomodeling (http://www.agr.nagoya-u.ac.jp/english/ 
e3senko-l.html) 

• Technische Universiteit Biomodeling and Informatics (http://www.bmi2.bmt.tue.nl/ 
Biomedinf/) 

• BioCybernetics Wiki, a vertical wiki on biomedical cybernetics and systems biology (http:/ 
/wi ki.biological-cybernetics.de) 

• Society for Mathematical Biology (http://www.smb.org/) 

• Bulletin of Mathematical Biology (http://www.springerlink.com/content/119979/) 

• European Society for Mathematical and Theoretical Biology (http://www.esmtb.org/) 

• Journal of Mathematical Biology (http://www.springerlink.com/content/100436/) 

• Biomathematics Research Centre at University of Canterbury (http://www.math. 
canterbury, ac.nz/bio/) 

• Centre for Mathematical Biology at Oxford University (http://www.maths.ox.ac.uk/ 
cmb/) 
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• Mathematical Biology at the National Institute for Medical Research (http://mathbio. 
nimr.mrc.ac.uk/) 

• Institute for Medical BioMathematics (http://www.imbm.org/) 

• Mathematical Biology Systems of Differential Equations (http://eqworld.ipmnet.ru/en/ 
solutions/syspde/spde-toc2.pdf) from EqWorid: The World of Mathematical Equations 

• Systems Biology Workbench - a set of tools for modelling biochemical networks (http:// 
sbw.kgi.edu) 

• The Collection of Biostatistics Research Archive (http://www.biostatsresearch.com/ 
repository/) 

• Statistical Applications in Genetics and Molecular Biology (http://www.bepress.com/ 
sagmb/) 

• The International Journal of Biostatistics (http://www.bepress.com/ijb/) 



Systems biology 



Systems biology is a 

biology-based inter-disciplinary 

study field that focuses on the 

systematic study of complex 

interactions in biological 

systems, thus using a new 

perspective (holism instead of 

reduction) to study them. 

Particularly from year 2000 

onwards, the term is used 

widely in the biosciences, and 

in a variety of contexts. 

Because the scientific method 

has been used primarily toward 

reductionism, one of the goals 

of systems biology is to discover new emergent properties that may arise from the systemic 

view used by this discipline in order to understand better the entirety of processes that 

happen in a biological system. 




Overview 

Systems biology can be considered from a number of different aspects: 

• Some sources discuss systems biology as a field of study, particularly, the study of the 
interactions between the components of biological systems, and how these interactions 
give rise to the function and behavior of that system (for example, the enzymes and 
metabolites in a -» metabolic pathway). 

• Other sources consider systems biology as a paradigm, usually defined in antithesis to 
the so-called reductionist paradigm, although fully consistent with the scientific method. 
The distinction between the two paradigms is referred to in these quotations: 

"The reductionist approach has successfully identified most of the components and 
many of the interactions but, unfortunately, offers no convincing concepts or methods 
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to understand how system properties emerge. ..the pluralism of causes and effects in 
biological networks is better addressed by observing, through quantitative measures, 
multiple components simultaneously and by rigorous data integration with 
mathematical models" Science 

"Systems biology... is about putting together rather than taking apart, integration 
rather than reduction. It requires that we develop ways of thinking about integration 
that are as rigorous as our reductionist programmes, but different.... It means changing 
our philosophy, in the full sense of the term" Denis Noble [ ] 

• Still other sources view systems biology in terms of the operational protocols used for 
performing research, namely a cycle composed of theory, analytic or computational 
modelling to propose specific testable hypotheses about a biological system, 
experimental validation, and then using the newly acquired quantitative description of 
cells or cell processes to refine the computational model or theory. Since the 
objective is a model of the interactions in a system, the experimental techniques that 
most suit systems biology are those that are system-wide and attempt to be as complete 
as possible. Therefore, transcriptomics, metabolomics, -» proteomics and 
high-throughput techniques are used to collect quantitative data for the construction and 
validation of models. 

• Engineers consider systems biology as the application of dynamical systems theory to 
molecular biology. 

• Finally, some sources see it as a socioscientific phenomenon defined by the strategy of 
pursuing integration of complex data about the interactions in biological systems from 
diverse experimental sources using interdisciplinary tools and personnel. 

This variety of viewpoints is illustrative of the fact that systems biology refers to a cluster of 
peripherally overlapping concepts rather than a single well-delineated field. However the 
term has widespread currency and popularity as of 2007, with chairs and institutes of 
systems biology proliferating worldwide (Such as the Institute for Systems Biology). 

History 

Systems biology finds its roots in: 

• the quantitative modelling of enzyme kinetics, a discipline that flourished between 1900 
and 1970, 

• the simulations developed to study neurophysiology, and 

• control theory and cybernetics. 

One of the theorists who can be seen as a precursor of systems biology is Ludwig von 
Bertalanffy with his general systems theory. One of the first numerical simulations in 
biology was published in 1952 by the British neurophysiologists and Nobel prize winners 
Alan Lloyd Hodgkin and Andrew Fielding Huxley, who constructed a mathematical model 
that explained the action potential propagating along the axon of a neuronal cell. Their 
model described a cellular function emerging from the interaction between two different 
molecular components, a potassium and a sodium channels, and can therefore be seen as 

rpi 

the beginning of computational systems biology. In 1960, Denis Noble developed the first 
computer model of the heart pacemaker. ' 

The formal study of systems biology, as a distinct discipline, was launched by systems 
theorist Mihajlo Mesarovic in 1966 with an international symposium at the Case Institute of 
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Technology in Cleveland, Ohio entitled "Systems Theory and Biology. " [ ] [ ] 

The 1960s and 1970s saw the development of several approaches to study complex 
molecular systems, such as the Metabolic Control Analysis and the biochemical systems 
theory. The successes of molecular biology throughout the 1980s, coupled with a skepticism 
toward theoretical biology, that then promised more than it achieved, caused the 
quantitative modelling of biological processes to become a somewhat minor field. 

However the birth of functional genomics in the 1990s meant that large quantities of high 
quality data became available, while the computing power exploded, making more realistic 
models possible. In 1997, the group of Masaru Tomita published the first quantitative 
model of the metabolism of a whole (hypothetical) cell. 

Around the year 2000, when Institutes of Systems Biology were established in Seattle and 
Tokyo, systems biology emerged as a movement in its own right, spurred on by the 
completion of various genome projects, the large increase in data from the omics (e.g. -» 
genomics and -» proteomics) and the accompanying advances in high-throughput 
experiments and -» bioinformatics. Since then, various research institutes dedicated to 

systems biology have been developed. As of summer 2006, due to a shortage of people in 

ri2i 
systems biology several doctoral training centres in systems biology have been 

established in many parts of the world. 



Techniques associated with systems biology 




According to the interpretation of 
System Biology as the ability to 
obtain, integrate and analyze complex 
data from multiple experimental 
sources using interdisciplinary tools, 
some typical technology platforms 
are: 

• Transcriptomics: whole cell or 
tissue gene expression 
measurements by DNA microarrays 
or serial analysis of gene expression 

• -» Proteomics: complete 
identification of proteins and 
protein expression patterns of a cell 
or tissue through two-dimensional 

gel electrophoresis and mass spectrometry or multi-dimensional protein identification 
techniques (advanced HPLC systems coupled with mass spectrometry). Sub disciplines 
include phosphoproteomics, glycoproteomics and other methods to detect chemically 
modified proteins. 

• Metabolomics: identification and measurement of all small-molecules metabolites within 
a cell or tissue 

• Glycomics: identification of the entirety of all carbohydrates in a cell or tissue. 

In addition to the identification and quantification of the above given molecules further 
techniques analyze the dynamics and interactions within a cell. This includes: 
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• -» Interactomics which is used mostly in the context of protein-protein interaction but in 
theory encompasses interactions between all molecules within a cell, 

• Fluxomics, which deals with the dynamic changes of molecules within a cell over time, 

• Biomics: systems analysis of the biome. 

The investigations are frequently combined with large scale perturbation methods, 
including gene-based (RNAi, mis-expression of wild type and mutant genes) and chemical 
approaches using small molecule libraries. Robots and automated sensors enable such 
large-scale experimentation and data acquisition. These technologies are still emerging and 
many face problems that the larger the quantity of data produced, the lower the quality. A 
wide variety of quantitative scientists (computational biologists, statisticians, 
mathematicians, computer scientists, engineers, and physicists) are working to improve the 
quality of these approaches and to create, refine, and retest the models to accurately 
reflect observations. 

The investigations of a single level of biological organization (such as those listed above) 
are usually referred to as Systematic Systems Biology. Other areas of Systems Biology 
includes Integrative Systems Biology, which seeks to integrate different types of 
information to advance the understanding the biological whole, and Dynamic Systems 
Biology, which aims to uncover how the biological whole changes over time (during 
evolution, for example, the onset of disease or in response to a perturbation). Functional 
Genomics may also be considered a sub-field of Systems Biology. 

The systems biology approach often involves the development of mechanistic models, such 
as the reconstruction of dynamic systems from the quantitative properties of their 
elementary building blocks. For instance, a cellular network can be modelled 

mathematically using methods coming from chemical kinetics and control theory. Due to 
the large number of parameters, variables and constraints in cellular networks, numerical 
and computational techniques are often used. Other aspects of computer science and 
informatics are also used in systems biology. These include new forms of computational 
model, such as the use of process calculi to model biological processes, the integration of 
information from the literature, using techniques of information extraction and text mining, 
the development of online databases and repositories for sharing data and models (such as 
BioModels Database), approaches to database integration and software interoperability via 
loose coupling of software, websites and databases and the development of syntactically 
and semantically sound ways of representing biological models, such as the Systems 
Biology Markup Language (SBML). 
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See also 

Related fields Related terms Systems biologists 

-» Complex systems biology • Life • Category: Systems biologists 

Complex systems • Artificial life Lists 

Complex systems biology • Gene regulatory network 

• Category: Systems biologists 
-» Bioinformatics • -» Metabolic network modelling 

• List of systems biology conferences 
Biological network • Living systems theory 

• List of omics topics in biology 
inference • Network Theory of Aging 

• List of publications in systems biology 
Biological systems • Regulome 

• List of systems biology research groups 
engineering • Systems Biology Markup 

Biomedical cybernetics Language (SBML) 

Biostatistics • SBO 

Theoretical Biophysics • Viable System Model 

Relational Biology • Antireductionism 

Translational Research 

Computational biology 

Computational systems 

biology 

Scotobiology 
Synthetic biology 
Systems biology modeling 
Systems ecology 
Systems immunology 
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Biotechnology is technology based on 
biology, especially when used in 
agriculture, food science, and medicine. 
United Nations Convention on Biological 
Diversity defines biotechnology as:^ ' 

Any technological application that 
uses biological systems, dead 
organisms, or derivatives thereof, to 
make or modify products or processes 
for specific use. 

Biotechnology is often used to refer to 
genetic engineering technology of the 21st 
century, however the term encompasses a 

wider range and history of procedures for modifying biological organisms according to the 
needs of humanity, going back to the initial modifications of native plants into improved 
food crops through artificial selection and hybridization. Bioengineering is the science upon 
which all biotechnological applications are based. With the development of new approaches 
and modern techniques, traditional biotechnology industries are also acquiring new 
horizons enabling them to improve the quality of their products and increase the 
productivity of their systems. 

Before 1971, the term, biotechnology, was primarily used in the agriculture and agriculture 
industries. Since the 1970s, it began to be used by the Western scientific establishment to 
refer to laboratory-based techniques being developed in biological research, such as 
recombinant DNA or tissue culture-based processes, or horizontal gene transfer in living 
plants, using vectors such as the Agrobacterium bacteria to transfer DNA into a host 
organism. In fact, the term should be used in a much broader sense to describe the whole 
range of methods, both ancient and modern, used to manipulate organic materials to reach 
the demands of food production. So the term could be defined as, "The application of 
indigenous and/or scientific knowledge to the management of (parts of) microorganisms, or 
of cells and tissues of higher organisms, so that these supply goods and services of use to 

To] 

the food industry and its consumers. 

Biotechnology combines disciplines like genetics, molecular biology, biochemistry, 
embryology, and cell biology, which are in turn linked to practical disciplines like chemical 
engineering, information technology, and biorobotics. Patho-biotechnology describes the 
exploitation of pathogens or pathogen derived compounds for beneficial effect. 
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History 

Although not normally thought of as 
biotechnology, agriculture clearly fits the 
broad definition of "using a biological 
system to make products" such that the 
cultivation of plants may be viewed as the 
earliest biotechnological enterprise. 
Agriculture has been theorized to have 
become the dominant way of producing 
food since the Neolithic Revolution. The 
processes and methods of agriculture have 
been refined by other mechanical and 
biological sciences since its inception. 
Through early biotechnology, farmers were 
able to select the best suited and 
highest-yield crops to produce enough food 
to support a growing population. Other 
uses of biotechnology were reguired as 
crops and fields became increasingly large 
and difficult to maintain. Specific 
organisms and organism by-products were 
used to fertilize, restore nitrogen, and 
control pests. Throughout the use of 
agriculture, farmers have inadvertently altered the genetics of their crops through 
introducing them to new environments and breeding them with other plants— one of the 
first forms of biotechnology. Cultures such as those in Mesopotamia, Egypt, and India 
developed the process of brewing beer. It is still done by the same basic method of using 
malted grains (containing enzymes) to convert starch from grains into sugar and then 
adding specific yeasts to produce beer. In this process the carbohydrates in the grains were 
broken down into alcohols such as ethanol. Ancient Indians also used the juices of the plant 
Ephedra vulgaris and used to call it Soma. Later other cultures produced the process of 
Lactic acid fermentation which allowed the fermentation and preservation of other forms of 
food. Fermentation was also used in this time period to produce leavened bread. Although 
the process of fermentation was not fully understood until Louis Pasteur's work in 1857, it 
is still the first use of biotechnology to convert a food source into another form. 




Brewing was an early application of biotechnology 



Combinations of plants and other organisms were used as medications in many early 
civilizations. Since as early as 200 BC, people began to use disabled or minute amounts of 
infectious agents to immunize themselves against infections. These and similar processes 
have been refined in modern medicine and have led to many developments such as 
antibiotics, vaccines, and other methods of fighting sickness. 

In the early twentieth century scientists gained a greater understanding of microbiology 
and explored ways of manufacturing specific products. In 1917, Chaim Weizmann first used 
a pure microbiological culture in an industrial process, that of manufacturing corn starch 
using Clostridium acetobutylicum, to produce acetone, which the United Kingdom 
desperately needed to manufacture explosives during World War I. 
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The field of modern biotechnology is thought to have largely begun on June 16, 1980, when 
the United States Supreme Court ruled that a genetically-modified microorganism could be 
patented in the case of Diamond v. Chakrabarty. Indian-born Ananda Chakrabarty, 
working for General Electric, had developed a bacterium (derived from the Pseudomonas 
genus) capable of breaking down crude oil, which he proposed to use in treating oil spills. 

Revenue in the industry is expected to grow by 12.9% in 2008. Another factor influencing 
the biotechnology sector's success is improved intellectual property rights legislation— and 
enforcement— worldwide, as well as strengthened demand for medical and pharmaceutical 
products to cope with an ageing, and ailing, U.S. population. 

Rising demand for biofuels is expected to be good news for the biotechnology sector, with 
the Department of Energy estimating ethanol usage could reduce U.S. petroleum-derived 
fuel consumption by up to 30% by 2030. The biotechnology sector has allowed the U.S. 
farming industry to rapidly increase its supply of corn and soybeans— the main inputs into 
biofuels— by developing genetically-modified seeds which are resistant to pests and 
drought. By boosting farm productivity, biotechnology plays a crucial role in ensuring that 
biofuel production targets are met. 



Applications 



Biotechnology has applications in four major 
industrial areas, including health care 
(medical), crop production and agriculture, 
non food (industrial) uses of crops and other 
products (e.g. biodegradable plastics, 
vegetable oil, biofuels), and environmental 
uses. 

For example, one application of 
biotechnology is the directed use of 
organisms for the manufacture of organic 
products (examples include beer and milk 
products). Another example is using 
naturally present bacteria by the mining 
industry in bioleaching. Biotechnology is also 
used to recycle, treat waste, clean up sites 
contaminated by industrial activities 
(bioremediation), and also to produce 
biological weapons. 

A series of derived terms have been coined 
to identify several branches of 
biotechnology, for example: 

• -» Bioinformatics is an interdisciplinary 
field which addresses biological problems 
using computational techniques, and 
makes the rapid organization and analysis of biological data possible. The field may also 

be referred to as computational biology, and can be defined as, "conceptualizing biology 
in terms of molecules and then applying informatics techniques to understand and 




A rose plant that began as cells grown in a tissue 
culture 
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organize the information associated with these molecules, on a large scale. " [ ] 
Bioinformatics plays a key role in various areas, such as functional genomics, structural 
genomics, and -» proteomics, and forms a key component in the biotechnology and 
pharmaceutical sector. 

• Blue biotechnology is a term that has been used to describe the marine and aquatic 
applications of biotechnology, but its use is relatively rare. 

• Green biotechnology is biotechnology applied to agricultural processes. An example 
would be the selection and domestication of plants via micropropagation. Another 
example is the designing of transgenic plants to grow under specific environmental 
conditions or in the presence (or absence) of certain agricultural chemicals. One hope is 
that green biotechnology might produce more environmentally friendly solutions than 
traditional industrial agriculture. An example of this is the engineering of a plant to 
express a pesticide, thereby eliminating the need for external application of pesticides. 
An example of this would be Bt corn. Whether or not green biotechnology products such 
as this are ultimately more environmentally friendly is a topic of considerable debate. 

• Red biotechnology is applied to medical processes. Some examples are the designing of 
organisms to produce antibiotics, and the engineering of genetic cures through genomic 
manipulation. 

• White biotechnology, also known as industrial biotechnology, is biotechnology applied 
to industrial processes. An example is the designing of an organism to produce a useful 
chemical. Another example is the using of enzymes as industrial catalysts to either 
produce valuable chemicals or destroy hazardous/polluting chemicals. White 
biotechnology tends to consume less in resources than traditional processes used to 
produce industrial goods. 

• The investments and economic output of all of these types of applied biotechnologies 
form what has been described as the bioeconomy. 

Medicine 

In medicine, modern biotechnology finds promising applications in such areas as 

• drug production; 

• pharmacogenomics; 

• gene therapy; and 

• genetic testing; 
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Pharmacogenomics 

Pharmacogenomics is the study of how the 
genetic inheritance of an individual affects 
his/her body's response to drugs. It is a 
coined word derived from the words 
"pharmacology" and "genomics". It is 
hence the study of the relationship between 
pharmaceuticals and genetics. The vision of 
pharmacogenomics is to be able to design 
and produce drugs that are adapted to each 
person's genetic makeup. 



[8] 




Pharmacogenomics results in the following 
benefits: 



DNA Microarray chip - Some can do as many as a 
million blood tests at once 



Development of tailor-made medicines. Using pharmacogenomics, pharmaceutical 
companies can create drugs based on the proteins, enzymes and RNA molecules that are 
associated with specific genes and diseases. These tailor-made drugs promise not only to 
maximize therapeutic effects but also to decrease damage to nearby healthy cells. 

More accurate methods of determining appropriate drug dosages. Knowing a patient's 
genetics will enable doctors to determine how well his/ her body can process and 
metabolize a medicine. This will maximize the value of the medicine and decrease the 
likelihood of overdose. 

Improvements in the drug discovery and approval process. The discovery of potential 
therapies will be made easier using genome targets. Genes have been associated with 
numerous diseases and disorders. With modern biotechnology, these genes can be used 
as targets for the development of effective new therapies, which could significantly 
shorten the drug discovery process. 

Better vaccines. Safer vaccines can be designed and produced by organisms 
transformed by means of genetic engineering. These vaccines will elicit the immune 
response without the attendant risks of infection. They will be inexpensive, stable, easy to 
store, and capable of being engineered to carry several strains of pathogen at once. 
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Computer-generated image of insulin hexamers 

highlighting the threefold symmetry, the zinc ions 

holding it together, and the histidine residues involved 

in zinc binding. 



Pharmaceutical products 

Most traditional pharmaceutical drugs are 

relatively simple molecules that have been 

found primarily through trial and error to 

treat the symptoms of a disease or illness. 

Biopharmaceuticals are large biological 

molecules known as proteins and these 

usually target the underlying mechanisms 

and pathways of a malady (but not always, 

as is the case with using insulin to treat 

type 1 diabetes mellitus, as that treatment 

merely addresses the symptoms of the 

disease, not the underlying cause which is 

autoimmunity); it is a relatively young 

industry. They can deal with targets in 

humans that may not be accessible with 

traditional medicines. A patient typically is dosed with a small molecule via a tablet while a 

large molecule is typically injected. 

Small molecules are manufactured by chemistry but larger molecules are created by living 
cells such as those found in the human body: for example, bacteria cells, yeast cells, animal 
or plant cells. 

Modern biotechnology is often associated with the use of genetically altered 
microorganisms such as E. coli or yeast for the production of substances like synthetic 
insulin or antibiotics. It can also refer to transgenic animals or transgenic plants, such as Bt 
corn. Genetically altered mammalian cells, such as Chinese Hamster Ovary (CHO) cells, are 
also used to manufacture certain pharmaceuticals. Another promising new biotechnology 
application is the development of plant-made pharmaceuticals. 

Biotechnology is also commonly associated with landmark breakthroughs in new medical 
therapies to treat hepatitis B, hepatitis C, cancers, arthritis, haemophilia, bone fractures, 
multiple sclerosis, and cardiovascular disorders. The biotechnology industry has also been 
instrumental in developing molecular diagnostic devices that can be used to define the 
target patient population for a given biopharmaceutical. Herceptin, for example, was the 
first drug approved for use with a matching diagnostic test and is used to treat breast 
cancer in women whose cancer cells express the protein HER2. 

Modern biotechnology can be used to manufacture existing medicines relatively easily and 
cheaply. The first genetically engineered products were medicines designed to treat human 
diseases. To cite one example, in 1978 Genentech developed synthetic humanized insulin by 
joining its gene with a plasmid vector inserted into the bacterium Escherichia coli. Insulin, 
widely used for the treatment of diabetes, was previously extracted from the pancreas of 
abattoir animals (cattle and/or pigs). The resulting genetically engineered bacterium 
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enabled the production of vast quantities of synthetic human insulin at relatively low cost 
, although the cost savings was used to increase profits for manufacturers, not passed on to 
consumers or their healthcare providers. According to a 2003 study undertaken by the 
International Diabetes Federation (IDF) on the access to and availability of insulin in its 
member countries, synthetic 'human' insulin is considerably more expensive in most 
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countries where both synthetic 'human' and animal insulin are commercially available: e.g. 
within European countries the average price of synthetic 'human' insulin was twice as high 
as the price of pork insulhr ] . Yet in its position statement, the IDF writes that "there is no 
overwhelming evidence to prefer one species of insulin over another" and "[modern, 
highly-purified] animal insulins remain a perfectly acceptable alternative' ' . 

Modern biotechnology has evolved, making it possible to produce more easily and relatively 

cheaply human growth hormone, clotting factors for hemophiliacs, fertility drugs, 

ri2i 
erythropoietin and other drugs. Most drugs today are based on about 500 molecular 

targets. Genomic knowledge of the genes involved in diseases, disease pathways, and 

drug-response sites are expected to lead to the discovery of thousands more new 

targets. [12] 



Genetic testing 




Gel electrophoresis 



Genetic testing involves the direct 
examination of the -* DNA molecule itself. 
A scientist scans a patient's DNA sample 
for mutated sequences. 

There are two major types of gene tests. In 

the first type, a researcher may design 

short pieces of DNA ("probes") whose 

sequences are complementary to the 

mutated sequences. These probes will seek 

their complement among the base pairs of 

an individual's genome. If the mutated 

sequence is present in the patient's 

genome, the probe will bind to it and flag 

the mutation. In the second type, a 

researcher may conduct the gene test by comparing the sequence of DNA bases in a 

patient's gene to disease in healthy individuals or their progeny. 

Genetic testing is now used for: 

• Carrier screening, or the identification of unaffected individuals who carry one copy of a 
gene for a disease that requires two copies for the disease to manifest; 

• Confirmational diagnosis of symptomatic individuals; 

• Determining sex; 

• Forensic/identity testing; 

• Newborn screening; 

• Prenatal diagnostic screening; 

• Presymptomatic testing for estimating the risk of developing adult-onset cancers; 

• Presymptomatic testing for predicting adult-onset disorders. 

Some genetic tests are already available, although most of them are used in developed 
countries. The tests currently available can detect mutations associated with rare genetic 
disorders like cystic fibrosis, sickle cell anemia, and Huntington's disease. Recently, tests 
have been developed to detect mutation for a handful of more complex conditions such as 
breast, ovarian, and colon cancers. However, gene tests may not detect every mutation 
associated with a particular condition because many are as yet undiscovered, and the ones 
they do detect may present different risks to different people and populations. 



[12] 
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The bacterium C Villos lada is routinely genetically 
engineered. 



Controversial questions 

Several issues have been raised regarding 
the use of genetic testing: 

1. Absence of cure. There is still a lack of 
effective treatment or preventive 
measures for many diseases and 
conditions now being diagnosed or 
predicted using gene tests. Thus, 
revealing information about risk of a 
future disease that has no existing cure 
presents an ethical dilemma for medical 
practitioners. 

2. Ownership and control of genetic 
information. Who will own and control 
genetic information, or information about 
genes, gene products, or inherited characteristics derived from an individual or a group 
of people like indigenous communities? At the macro level, there is a possibility of a 
genetic divide, with developing countries that do not have access to medical applications 
of biotechnology being deprived of benefits accruing from products derived from genes 
obtained from their own people. Moreover, genetic information can pose a risk for 
minority population groups as it can lead to group stigmatization. 

At 

At the individual level, the absence of privacy and anti-discrimination legal protections in 
most countries can lead to discrimination in employment or insurance or other misuse of 
personal genetic information. This raises questions such as whether genetic privacy is 
different from medical privacy. 

1. Reproductive issues. These include the use of genetic information in reproductive 
decision-making and the possibility of genetically altering reproductive cells that may be 
passed on to future generations. For example, germline therapy forever changes the 
genetic make-up of an individual's descendants. Thus, any error in technology or 
judgment may have far-reaching consequences. Ethical issues like designer babies and 
human cloning have also given rise to controversies between and among scientists and 
bioethicists, especially in the light of past abuses with eugenics. 

2. Clinical issues. These center on the capabilities and limitations of doctors and other 
health-service providers, people identified with genetic conditions, and the general public 
in dealing with genetic information. 

3. Effects on social institutions. Genetic tests reveal information about individuals and their 
families. Thus, test results can affect the dynamics within social institutions, particularly 
the family. 

4. Conceptual and philosophical implications regarding human responsibility, free will 
vis-a-vis genetic determinism, and the concepts of health and disease. 
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Gene therapy 

Gene therapy may be used for treating, or 
even curing, genetic and acquired diseases 
like cancer and AIDS by using normal 
genes to supplement or replace defective 
genes or to bolster a normal function such 
as immunity. It can be used to target 
somatic (i.e., body) or gametes (i.e., egg 
and sperm) cells. In somatic gene therapy, 
the genome of the recipient is changed, but 
this change is not passed along to the next 
generation. In contrast, in germline gene 
therapy, the egg and sperm cells of the 
parents are changed for the purpose of 
passing on the changes to their offspring. 
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Gene therapy using an Adenovirus vector. A new gene 
is inserted into an adenovirus vector, which is used to 

introduce the modified -» DNA into a human cell. If 

the treatment is successful, the new gene will make a 

functional protein. 



There are basically two ways of 
implementing a gene therapy treatment: 

1. Ex vivo, which means "outside the body" - Cells from the patient's blood or bone marrow 
are removed and grown in the laboratory. They are then exposed to a virus carrying the 
desired gene. The virus enters the cells, and the desired gene becomes part of the DNA 
of the cells. The cells are allowed to grow in the laboratory before being returned to the 
patient by injection into a vein. 

2. In vivo, which means "inside the body" - No cells are removed from the patient's body. 
Instead, vectors are used to deliver the desired gene to cells in the patient's body. 

Currently, the use of gene therapy is limited. Somatic gene therapy is primarily at the 
experimental stage. Germline therapy is the subject of much discussion but it is not being 
actively investigated in larger animals and human beings. 

As of June 2001, more than 500 clinical gene-therapy trials involving about 3,500 patients 
have been identified worldwide. Around 78% of these are in the United States, with Europe 
having 18%. These trials focus on various types of cancer, although other multigenic 
diseases are being studied as well. Recently, two children born with severe combined 
immunodeficiency disorder ("SCID") were reported to have been cured after being given 
genetically engineered cells. 

Gene therapy faces many obstacles before it can become a practical approach for treating 
disease. At least four of these obstacles are as follows: 

1. Gene delivery tools. Genes are inserted into the body using gene carriers called vectors. 
The most common vectors now are viruses, which have evolved a way of encapsulating 
and delivering their genes to human cells in a pathogenic manner. Scientists manipulate 
the genome of the virus by removing the disease-causing genes and inserting the 
therapeutic genes. However, while viruses are effective, they can introduce problems like 
toxicity, immune and inflammatory responses, and gene control and targeting issues. In 
addition, in order for gene therapy to provide permanent therapeutic effects, the 
introduced gene needs to be integrated within the host cell's genome. Some viral vectors 
effect this in a random fashion, which can introduce other problems such as disruption of 
an endogenous host gene. 
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2. High costs. Since gene therapy is relatively new and at an experimental stage, it is an 
expensive treatment to undertake. This explains why current studies are focused on 
illnesses commonly found in developed countries, where more people can afford to pay 
for treatment. It may take decades before developing countries can take advantage of 
this technology. 

3. Limited knowledge of the functions of genes. Scientists currently know the functions of 
only a few genes. Hence, gene therapy can address only some genes that cause a 
particular disease. Worse, it is not known exactly whether genes have more than one 
function, which creates uncertainty as to whether replacing such genes is indeed 
desirable. 

4. Multigene disorders and effect of environment. Most genetic disorders involve more 
than one gene. Moreover, most diseases involve the interaction of several genes and the 
environment. For example, many people with cancer not only inherit the disease gene for 
the disorder, but may have also failed to inherit specific tumor suppressor genes. Diet, 
exercise, smoking and other environmental factors may have also contributed to their 
disease. 



Human Genome Project 

The Human Genome Project is an initiative of the U.S. 
Department of Energy ("DOE") that aims to generate a 
high-quality reference sequence for the entire human 
genome and identify all the human genes. 

The DOE and its predecessor agencies were assigned 
by the U.S. Congress to develop new energy resources 
and technologies and to pursue a deeper 
understanding of potential health and environmental 
risks posed by their production and use. In 1986, the 
DOE announced its Human Genome Initiative. Shortly 
thereafter, the DOE and National Institutes of Health 
developed a plan for a joint Human Genome Project 
("HGP"), which officially began in 1990. 

The HGP was originally planned to last 15 years. 
However, rapid technological advances and worldwide 
participation accelerated the completion date to 2003 
(making it a 13 year project). Already it has enabled 
gene hunters to pinpoint genes associated with more 
than 30 disorders. 

Cloning 

Cloning involves the removal of the nucleus from one 
cell and its placement in an unfertilized egg cell whose 
nucleus has either been deactivated or removed. 

There are two types of cloning: 

1. Reproductive cloning. After a few divisions, the egg cell is placed into a uterus where it 
is allowed to develop into a fetus that is genetically identical to the donor of the original 
nucleus. 
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2. Therapeutic cloning. ] The egg is placed into a Petri dish where it develops into 
embryonic stem cells, which have shown potentials for treating several ailments. 

In February 1997, cloning became the focus of media attention when Ian Wilmut and his 
colleagues at the Roslin Institute announced the successful cloning of a sheep, named 
Dolly, from the mammary glands of an adult female. The cloning of Dolly made it apparent 

to many that the techniques used to produce her could someday be used to clone human 

ri8i 
beings. This stirred a lot of controversy because of its ethical implications. 

Agriculture 

Responsible biotechnology is not the enemy; starvation is. Without adequate food 
supplies at affordable prices, we cannot expect world health or peace. 

—Jimmy Carter, Former President of the United States, 11 Jul 1997, 

Improve Yield from Crops 

Using the techniques of modern biotechnology, one or two genes may be transferred to a 
highly developed crop variety to impart a new character that would increase its yield. 
However, while increases in crop yield are the most obvious applications of modern 
biotechnology in agriculture, it is also the most difficult one. Current genetic engineering 
techniques work best for effects that are controlled by a single gene. Many of the genetic 
characteristics associated with yield (e.g., enhanced growth) are controlled by a large 
number of genes, each of which has a minimal effect on the overall yield. There is, 
therefore, much scientific work to be done in this area. 

Reduced vulnerability of crops to environmental stresses 

Crops containing genes that will enable them to withstand biotic and abiotic stresses may 
be developed. For example, drought and excessively salty soil are two important limiting 
factors in crop productivity. Biotechnologists are studying plants that can cope with these 
extreme conditions in the hope of finding the genes that enable them to do so and 
eventually transferring these genes to the more desirable crops. One of the latest 
developments is the identification of a plant gene, At-DBF2, from thale cress, a tiny weed 
that is often used for plant research because it is very easy to grow and its genetic code is 
well mapped out. When this gene was inserted into tomato and tobacco cells (see RNA 
interference), the cells were able to withstand environmental stresses like salt, drought, 
cold and heat, far more than ordinary cells. If these preliminary results prove successful in 
larger trials, then At-DBF2 genes can help in engineering crops that can better withstand 
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harsh environments. Researchers have also created transgenic rice plants that are 
resistant to rice yellow mottle virus (RYMV). In Africa, this virus destroys majority of the 

["23"! 

rice crops and makes the surviving plants more susceptible to fungal infections. 
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Increased nutritional qualities &quantity of food crops 

Proteins in foods may be modified to increase their nutritional qualities. Proteins in 
legumes and cereals may be transformed to provide the amino acids needed by human 
beings for a balanced diet. A good example is the work of Professors Ingo Potrykus and 
Peter Beyer on the so-called Golden rice (discussed below). 

Improved taste, texture or appearance of food 

Modern biotechnology can be used to slow down the process of spoilage so that fruit can 
ripen longer on the plant and then be transported to the consumer with a still reasonable 
shelf life. This alters the taste, texture and appearance of the fruit. More importantly, it 
could expand the market for farmers in developing countries due to the reduction in 
spoilage. However, there is sometimes a lack of understanding by researchers in developed 
countries about the actual needs of prospective beneficiaries in developing countries. For 
example, engineering soybeans to resist spoilage makes them less suitable for producing 
tempeh which is a significant source of protein that depends on fermentation. The use of 
modified soybeans results in a lumpy texture that is less palatable and less convenient 
when cooking. 

The first genetically modified food product was a tomato which was transformed to delay its 
ripening. ] Researchers in Indonesia, Malaysia, Thailand, Philippines and Vietnam are 
currently working on delayed-ripening papaya in collaboration with the University of 
Nottingham and ZenecaJ ' 

Biotechnology in cheese production:' ' enzymes produced by micro-organisms provide an 
alternative to animal rennet - a cheese coagulant - and an alternative supply for cheese 
makers. This also eliminates possible public concerns with animal-derived material, 
although there are currently no plans to develop synthetic milk, thus making this argument 
less compelling. Enzymes offer an animal-friendly alternative to animal rennet. While 
providing comparable quality, they are theoretically also less expensive. 
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About 85 million tons of wheat flour is used every year to bake bread. By adding an 
enzyme called maltogenic amylase to the flour, bread stays fresher longer. Assuming that 
10-15% of bread is thrown away as stale, if it could be made to stay fresh another 5-7 days 
then perhaps 2 million tons of flour per year would be saved. Other enzymes can cause 
bread to expand to make a lighter loaf, or alter the loaf in a range of ways. 

Reduced dependence on fertilizers, pesticides and other agrochemicals 

Most of the current commercial applications of modern biotechnology in agriculture are on 
reducing the dependence of farmers on agrochemicals. For example, Bacillus thuringiensis 
(Bt) is a soil bacterium that produces a protein with insecticidal qualities. Traditionally, a 
fermentation process has been used to produce an insecticidal spray from these bacteria. In 
this form, the Bt toxin occurs as an inactive protoxin, which requires digestion by an insect 
to be effective. There are several Bt toxins and each one is specific to certain target insects. 
Crop plants have now been engineered to contain and express the genes for Bt toxin, which 
they produce in its active form. When a susceptible insect ingests the transgenic crop 
cultivar expressing the Bt protein, it stops feeding and soon thereafter dies as a result of 
the Bt toxin binding to its gut wall. Bt corn is now commercially available in a number of 
countries to control corn borer (a lepidopteran insect), which is otherwise controlled by 
spraying (a more difficult process). 
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Crops have also been genetically engineered to acquire tolerance to broad-spectrum 
herbicide. The lack of cost-effective herbicides with broad-spectrum activity and no crop 
injury was a consistent limitation in crop weed management. Multiple applications of 
numerous herbicides were routinely used to control a wide range of weed species 
detrimental to agronomic crops. Weed management tended to rely on preemergence — that 
is, herbicide applications were sprayed in response to expected weed infestations rather 
than in response to actual weeds present. Mechanical cultivation and hand weeding were 
often necessary to control weeds not controlled by herbicide applications. The introduction 
of herbicide tolerant crops has the potential of reducing the number of herbicide active 
ingredients used for weed management, reducing the number of herbicide applications 
made during a season, and increasing yield due to improved weed management and less 
crop injury. Transgenic crops that express tolerance to glyphosate, glufosinate and 
bromoxynil have been developed. These herbicides can now be sprayed on transgenic crops 
without inflicting damage on the crops while killing nearby weeds. 

From 1996 to 2001, herbicide tolerance was the most dominant trait introduced to 
commercially available transgenic crops, followed by insect resistance. In 2001, herbicide 
tolerance deployed in soybean, corn and cotton accounted for 77% of the 626,000 square 
kilometres planted to transgenic crops; Bt crops accounted for 15%; and "stacked genes" 
for herbicide tolerance and insect resistance used in both cotton and corn accounted for 
8%. [29] 

Production of novel substances in crop plants 

Biotechnology is being applied for novel uses other than food. For example, oilseed can be 
modified to produce fatty acids for detergents, substitute fuels and petrochemicals. 
Potatoes, tomatoes, ricererere tobacco, lettuce, safflowers, and other plants have been 
genetically-engineered to produce insulin and certain vaccines. If future clinical trials prove 
successful, the advantages of edible vaccines would be enormous, especially for developing 
countries. The transgenic plants may be grown locally and cheaply. Homegrown vaccines 
would also avoid logistical and economic problems posed by having to transport traditional 
preparations over long distances and keeping them cold while in transit. And since they are 
edible, they will not need syringes, which are not only an additional expense in the 
traditional vaccine preparations but also a source of infections if contaminated. In the 
case of insulin grown in transgenic plants, it is well-established that the gastrointestinal 
system breaks the protein down therefore this could not currently be administered as an 
edible protein. However, it might be produced at significantly lower cost than insulin 
produced in costly, bioreactors. For example, Calgary, Canada-based SemBioSys Genetics, 
Inc. reports that its safflower-produced insulin will reduce unit costs by over 25% or 

more and approximates a reduction in the capital costs associated with building a 
commercial-scale insulin manufacturing facility of over $100 million, compared to 
traditional biomanufacturing facilities . 
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Criticism 

There is another side to the agricultural biotechnology issue. It includes increased 
herbicide usage and resultant herbicide resistance, "super weeds/' residues on and in food 
crops, genetic contamination of non-GM crops which hurt organic and conventional 
farmers, damage to wildlife from glyphosate, etc. 

Biological engineering 

Biotechnological engineering or biological engineering is a branch of engineering that 
focuses on biotechnologies and biological science. It includes different disciplines such as 
biochemical engineering, biomedical engineering, bio-process engineering, biosystem 
engineering and so on. Because of the novelty of the field, the definition of a bioengineer is 
still undefined. However, in general it is an integrated approach of fundamental biological 
sciences and traditional engineering principles. 

Bioengineers are often employed to scale up bio processes from the laboratory scale to the 
manufacturing scale. Moreover, as with most engineers, they often deal with management, 
economic and legal issues. Since patents and regulation (e.g., U.S. Food and Drug 
Administration regulation in the U.S.) are very important issues for biotech enterprises, 
bioengineers are often required to have knowledge related to these issues. 

The increasing number of biotech enterprises is likely to create a need for bioengineers in 
the years to come. Many universities throughout the world are now providing programs in 
bioengineering and biotechnology (as independent programs or specialty programs within 
more established engineering fields). 

Bioremediation and Biodegradation 

Biotechnology is being used to engineer and adapt organisms especially microorganisms in 
an effort to find sustainable ways to clean up contaminated environments. The elimination 
of a wide range of pollutants and wastes from the environment is an absolute requirement 
to promote a sustainable development of our society with low environmental impact. 
Biological processes play a major role in the removal of contaminants and biotechnology is 
taking advantage of the astonishing catabolic versatility of microorganisms to 
degrade/convert such compounds. New methodological breakthroughs in sequencing, -» 
genomics, -» proteomics, -» bioinformatics and imaging are producing vast amounts of 
information. In the field of Environmental Microbiology, genome-based global studies open 
a new era providing unprecedented in silico views of metabolic and regulatory networks, as 
well as clues to the evolution of degradation pathways and to the molecular adaptation 
strategies to changing environmental conditions. Functional genomic and metagenomic 
approaches are increasing our understanding of the relative importance of different 
pathways and regulatory networks to carbon flux in particular environments and for 
particular compounds and they will certainly accelerate the development of bioremediation 
technologies and biotransformation processes. 

Marine environments are especially vulnerable since oil spills of coastal regions and the 
open sea are poorly containable and mitigation is difficult. In addition to pollution through 
human activities, millions of tons of petroleum enter the marine environment every year 
from natural seepages. Despite its toxicity, a considerable fraction of petroleum oil entering 
marine systems is eliminated by the hydrocarbon-degrading activities of microbial 
communities, in particular by a remarkable recently discovered group of specialists, the 
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so-called hydrocarbonoclastic bacteria (HCCB). ] 

Education 

In 1988, after prompting from the United States Congress, the National Institute of General 
Medical Sciences (National Institutes of Health) instituted a funding mechanism for 
biotechnology training. Universities nationwide compete for these funds to establish 
Biotechnology Training Programs (BTPs). Each successful application is generally funded 
for five years then must be competitively renewed. Graduate students in turn compete for 
acceptance into a BTP. If accepted, stipend, tuition and health insurance support is 
provided for two or three years during the course of their PhD thesis work. One example is 
the Biotechnology Training Program - University of Virginia. Eighteen other institutions 
offer NIGMS supported BTPs[37]. Biotechnology training is also offered at the 
undergraduate level and in community colleges. Examples include the Biotechnology 
Major[38] at [James Madison University] and the Biotechnology Career Studies 
Certificate[39] at [Piedmont Virginia Community College]. 

Notable researchers and individuals 

Canada : Frederick Banting, Lap-Chee Tsui, Tak Wah Mak, Lome Babiuk 
Europe : Francis Crick, Jacques Monod, Paul Nurse, Ingo Potrykus, Ralf Reski, Arpad 
Pusztai, Werner Arber 
Finland : Leena Palotie 
Iceland : Kari Stefansson 
India : Kiran Mazumdar-Shaw (Biocon) 
Ireland : Timothy O'Brien, Dermot P Kelleher 
Mexico : Francisco Bolivar Zapata, Luis Herrera-Estrella 

U.S. : Roger Beachy, David Botstein, Herbert Boyer, Sydney Brenner, James J. Collins, 
Leroy Hood, Eric Lander, Robert Langer, Thomas Okarma, Craig Venter, James D. 
Watson, Michael West 
• Zimbabwe: Christopher Chetsanga 

See also 

Bioeconomics 

Biomimetics 

Biotechnology industrial park 

Bionic architecture 

Green Revolution 

Genetic Engineering 

International Assessment of Agricultural Science and Technology for Development 

International Service for the Acquisition of Agri-biotech Applications 
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List of biotechnology companies 

List of emerging technologies 

NASDAQ Biotechnology Index 

SWORD-financing 
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Bioinformatics is the application of information 

technology to the field of molecular biology. The 

term bioinformatics was coined by Paulien 

Hogeweg in 1978 for the study of informatic 

processes in biotic systems. Bioinformatics 

nowadays entails the creation and advancement 

of databases, algorithms, computational and 

statistical techniques, and theory to solve formal 

and practical problems arising from the 

management and analysis of biological data. 

Over the past few decades rapid developments 

in genomic and other molecular research 

technologies and developments in information 

technologies have combined to produce a 

tremendous amount of information related to 

molecular biology. It is the name given to these 

mathematical and computing approaches used to 

glean understanding of biological processes. 

Common activities in bioinformatics include 

mapping and analyzing -> DNA and protein 

sequences, aligning different -» DNA and protein sequences to compare them and creating 

and viewing 3-D models of protein structures. 
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Map of the human X chromosome (from the 

NCBI website). Assembly of the human genome 

is one of the greatest achievements of 

bioinformatics. 



The primary goal of bioinformatics is to increase our understanding of biological processes. 
What sets it apart from other approaches, however, is its focus on developing and applying 

computationally intensive techniques (e.g., data mining, and machine learning algorithms) 
to achieve this goal. Major research efforts in the field include sequence alignment, gene 
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finding, genome assembly, protein structure alignment, protein structure prediction, 
prediction of gene expression and protein-protein interactions, genome-wide association 
studies and the modeling of evolution. 

Introduction 

Bioinformatics was applied in the creation and maintenance of a database to store 
biological information at the beginning of the "genomic revolution", such as nucleotide and 
amino acid sequences. Development of this type of database involved not only design issues 
but the development of complex interfaces whereby researchers could both access existing 
data as well as submit new or revised data. 

In order to study how normal cellular activities are altered in different disease states, the 
biological data must be combined to form a comprehensive picture of these activities. 
Therefore, the field of bioinformatics has evolved such that the most pressing task now 
involves the analysis and interpretation of various types of data, including nucleotide and 
amino acid sequences, protein domains, and protein structures. The actual process of 
analyzing and interpreting data is referred to as computational biology. Important 
sub-disciplines within bioinformatics and computational biology include: 

a) the development and implementation of tools that enable efficient access to, and use and 
management of, various types of information, b) the development of new algorithms 
(mathematical formulas) and statistics with which to assess relationships among members 
of large data sets, such as methods to locate a gene within a sequence, predict protein 
structure and/or function, and cluster protein sequences into families of related sequences. 

Major research areas 
Sequence analysis 

Since the Phage 0-X174 was sequenced in 1977, the DNA sequences of hundreds of 
organisms have been decoded and stored in databases. The information is analyzed to 
determine genes that encode polypeptides, as well as regulatory sequences. A comparison 
of genes within a species or between different species can show similarities between 
protein functions, or relations between species (the use of molecular systematics to 
construct phylogenetic trees). With the growing amount of data, it long ago became 
impractical to analyze DNA sequences manually. Today, computer programs are used to 
search the genome of thousands of organisms, containing billions of nucleotides. These 
programs would compensate for mutations (exchanged, deleted or inserted bases) in the 
DNA sequence, in order to identify sequences that are related, but not identical. A variant 
of this sequence alignment is used in the sequencing process itself. The so-called shotgun 
sequencing technique (which was used, for example, by The Institute for Genomic Research 
to sequence the first bacterial genome, Haemophilus influenzae) does not give a sequential 
list of nucleotides, but instead the sequences of thousands of small DNA fragments (each 
about 600-800 nucleotides long). The ends of these fragments overlap and, when aligned in 
the right way, make up the complete genome. Shotgun sequencing yields sequence data 
quickly, but the task of assembling the fragments can be quite complicated for larger 
genomes. In the case of the Human Genome Project, it took several days of CPU time (on 
one hundred Pentium III desktop machines clustered specifically for the purpose) to 
assemble the fragments. Shotgun sequencing is the method of choice for virtually all 
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genomes sequenced today, and genome assembly algorithms are a critical area of 
bioinformatics research. 

Another aspect of bioinformatics in sequence analysis is the automatic search for genes and 
regulatory sequences within a genome. Not all of the nucleotides within a genome are 
genes. Within the genome of higher organisms, large parts of the DNA do not serve any 
obvious purpose. This so-called junk DNA may, however, contain unrecognized functional 
elements. Bioinformatics helps to bridge the gap between genome and proteome 
projects-for example, in the use of DNA sequences for protein identification. 

See also: sequence analysis, sequence profiling tool, sequence motif. 

Genome annotation 

In the context of -» genomics, annotation is the process of marking the genes and other 
biological features in a DNA sequence. The first genome annotation software system was 
designed in 1995 by Dr. Owen White, who was part of the team that sequenced and 
analyzed the first genome of a free-living organism to be decoded, the bacterium 
Haemophilus influenzae. Dr. White built a software system to find the genes (places in the 
DNA sequence that encode a protein), the transfer RNA, and other features, and to make 
initial assignments of function to those genes. Most current genome annotation systems 
work similarly, but the programs available for analysis of genomic DNA are constantly 
changing and improving. 

Computational evolutionary biology 

Evolutionary biology is the study of the origin and descent of species, as well as their 
change over time. Informatics has assisted evolutionary biologists in several key ways; it 
has enabled researchers to: 

• trace the evolution of a large number of organisms by measuring changes in their -» 
DNA, rather than through physical taxonomy or physiological observations alone, 

• more recently, compare entire genomes, which permits the study of more complex 
evolutionary events, such as gene duplication, horizontal gene transfer, and the 
prediction of factors important in bacterial speciation, 

• build complex computational models of populations to predict the outcome of the system 
over time 

• track and share information on an increasingly large number of species and organisms 

Future work endeavours to reconstruct the now more complex tree of life. 

The area of research within computer science that uses genetic algorithms is sometimes 
confused with computational evolutionary biology, but the two areas are unrelated. 

Measuring biodiversity 

Biodiversity of an ecosystem might be defined as the total genomic complement of a 
particular environment, from all of the species present, whether it is a biofilm in an 
abandoned mine, a drop of sea water, a scoop of soil, or the entire biosphere of the planet 
Earth. Databases are used to collect the species names, descriptions, distributions, genetic 
information, status and size of populations, habitat needs, and how each organism interacts 
with other species. Specialized software programs are used to find, visualize, and analyze 
the information, and most importantly, communicate it to other people. Computer 



Bioinformatics 121 

simulations model such things as population dynamics, or calculate the cumulative genetic 
health of a breeding pool (in agriculture) or endangered population (in conservation). One 
very exciting potential of this field is that entire -» DNA sequences, or genomes of 
endangered species can be preserved, allowing the results of Nature's genetic experiment 
to be remembered in silico, and possibly reused in the future, even if that species is 
eventually lost. 

Analysis of gene expression 

The expression of many genes can be determined by measuring mRNA levels with multiple 
techniques including microarrays, expressed cDNA sequence tag (EST) sequencing, serial 
analysis of gene expression (SAGE) tag sequencing, massively parallel signature 
sequencing (MPSS), or various applications of multiplexed in-situ hybridization. All of these 
techniques are extremely noise-prone and/or subject to bias in the biological measurement, 
and a major research area in computational biology involves developing statistical tools to 
separate signal from noise in high-throughput gene expression studies. Such studies are 
often used to determine the genes implicated in a disorder: one might compare microarray 
data from cancerous epithelial cells to data from non-cancerous cells to determine the 
transcripts that are up-regulated and down-regulated in a particular population of cancer 
cells. 

Analysis of regulation 

Regulation is the complex orchestration of events starting with an extracellular signal such 
as a hormone and leading to an increase or decrease in the activity of one or more proteins. 
Bioinformatics techniques have been applied to explore various steps in this process. For 
example, promoter analysis involves the identification and study of sequence motifs in the 
DNA surrounding the coding region of a gene. These motifs influence the extent to which 
that region is transcribed into mRNA. Expression data can be used to infer gene regulation: 
one might compare microarray data from a wide variety of states of an organism to form 
hypotheses about the genes involved in each state. In a single-cell organism, one might 
compare stages of the cell cycle, along with various stress conditions (heat shock, 
starvation, etc.). One can then apply clustering algorithms to that expression data to 
determine which genes are co-expressed. For example, the upstream regions (promoters) of 
co-expressed genes can be searched for over-represented regulatory elements. 

Analysis of protein expression 

Protein microarrays and high throughput (HT) mass spectrometry (MS) can provide a 
snapshot of the proteins present in a biological sample. Bioinformatics is very much 
involved in making sense of protein microarray and HT MS data; the former approach faces 
similar problems as with microarrays targeted at mRNA, the latter involves the problem of 
matching large amounts of mass data against predicted masses from protein sequence 
databases, and the complicated statistical analysis of samples where multiple, but 
incomplete peptides from each protein are detected. 
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Analysis of mutations in cancer 

In cancer, the genomes of affected cells are rearranged in complex or even unpredictable 
ways. Massive sequencing efforts are used to identify previously unknown point mutations 
in a variety of genes in cancer. Bioinformaticians continue to produce specialized 
automated systems to manage the sheer volume of sequence data produced, and they 
create new algorithms and software to compare the sequencing results to the growing 
collection of human genome sequences and germline polymorphisms. New physical 
detection technology are employed, such as oligonucleotide microarrays to identify 
chromosomal gains and losses (called comparative genomic hybridization), and single 
nucleotide polymorphism arrays to detect known point mutations. These detection methods 
simultaneously measure several hundred thousand sites throughout the genome, and when 
used in high-throughput to measure thousands of samples, generate terabytes of data per 
experiment. Again the massive amounts and new types of data generate new opportunities 
for bioinformaticians. The data is often found to contain considerable variability, or noise, 
and thus Hidden Markov model and change-point analysis methods are being developed to 
infer real copy number changes. 

Another type of data that requires novel informatics development is the analysis of lesions 
found to be recurrent among many tumors . 

Prediction of protein structure 

Protein structure prediction is another important application of bioinformatics. The amino 
acid sequence of a protein, the so-called primary structure, can be easily determined from 
the sequence on the gene that codes for it. In the vast majority of cases, this primary 
structure uniquely determines a structure in its native environment. (Of course, there are 
exceptions, such as the bovine spongiform encephalopathy - aka Mad Cow Disease - prion.) 
Knowledge of this structure is vital in understanding the function of the protein. For lack of 
better terms, structural information is usually classified as one of secondary, tertiary and 
quaternary structure. A viable general solution to such predictions remains an open 
problem. As of now, most efforts have been directed towards heuristics that work most of 
the time. 

One of the key ideas in bioinformatics is the notion of homology. In the genomic branch of 
bioinformatics, homology is used to predict the function of a gene: if the sequence of gene 
A, whose function is known, is homologous to the sequence of gene B, whose function is 
unknown, one could infer that B may share A's function. In the structural branch of 
bioinformatics, homology is used to determine which parts of a protein are important in 
structure formation and interaction with other proteins. In a technique called homology 
modeling, this information is used to predict the structure of a protein once the structure of 
a homologous protein is known. This currently remains the only way to predict protein 
structures reliably. 

One example of this is the similar protein homology between hemoglobin in humans and the 
hemoglobin in legumes (leghemoglobin). Both serve the same purpose of transporting 
oxygen in the organism. Though both of these proteins have completely different amino 
acid sequences, their protein structures are virtually identical, which reflects their near 
identical purposes. 

Other techniques for predicting protein structure include protein threading and de novo 
(from scratch) physics-based modeling. 
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See also: structural motif and structural domain. 

Comparative genomics 

The core of comparative genome analysis is the establishment of the correspondence 
between genes (orthology analysis) or other genomic features in different organisms. It is 
these intergenomic maps that make it possible to trace the evolutionary processes 
responsible for the divergence of two genomes. A multitude of evolutionary events acting at 
various organizational levels shape genome evolution. At the lowest level, point mutations 
affect individual nucleotides. At a higher level, large chromosomal segments undergo 
duplication, lateral transfer, inversion, transposition, deletion and insertion. Ultimately, 
whole genomes are involved in processes of hybridization, polyploidization and 
endosymbiosis, often leading to rapid speciation. The complexity of genome evolution poses 
many exciting challenges to developers of mathematical models and algorithms, who have 
recourse to a spectra of algorithmic, statistical and mathematical techniques, ranging from 
exact, heuristics, fixed parameter and approximation algorithms for problems based on 
parsimony models to Markov Chain Monte Carlo algorithms for Bayesian analysis of 
problems based on probabilistic models. 

Many of these studies are based on the homology detection and protein families 
computation. 

Modeling biological systems 

Systems biology involves the use of computer simulations of cellular subsystems (such as 
the -» networks of metabolites and enzymes which comprise metabolism, signal 
transduction pathways and gene regulatory networks) to both analyze and visualize the 
complex connections of these cellular processes. Artificial life or virtual evolution attempts 
to understand evolutionary processes via the computer simulation of simple (artificial) life 
forms. 

High-throughput image analysis 

Computational technologies are used to accelerate or fully automate the processing, 
quantification and analysis of large amounts of high-information-content biomedical 
imagery. Modern image analysis systems augment an observer's ability to make 
measurements from a large or complex set of images, by improving accuracy, objectivity, or 
speed. A fully developed analysis system may completely replace the observer. Although 
these systems are not unique to biomedical imagery, biomedical imaging is becoming more 
important for both diagnostics and research. Some examples are: 

• high-throughput and high-fidelity quantification and sub-cellular localization 
(high-content screening, cytohistopathology) 

• morphometries 

• clinical image analysis and visualization 

• determining the real-time air-flow patterns in breathing lungs of living animals 

• quantifying occlusion size in real-time imagery from the development of and recovery 
during arterial injury 

• making behavioral observations from extended video recordings of laboratory animals 

• infrared measurements for metabolic activity determination 

• inferring clone overlaps in DNA mapping, e.g. the Sulston score 
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Protein-protein docking 

In the last two decades, tens of thousands of protein three-dimensional structures have 
been determined by X-ray crystallography and Protein nuclear magnetic resonance 
spectroscopy (protein NMR). One central question for the biological scientist is whether it 
is practical to predict possible protein-protein interactions only based on these 3D shapes, 
without doing -» protein-protein interaction experiments. A variety of methods have been 
developed to tackle the Protein-protein docking problem, though it seems that there is still 
much work to be done in this field. 

Software and tools 

Software tools for bioinformatics range from simple command-line tools, to more complex 
graphical programs and standalone web-services available from various bioinformatics 
companies or public institutions. The computational biology tool best-known among 
biologists is probably BLAST, an algorithm for determining the similarity of arbitrary 
sequences against other sequences, possibly from curated databases of protein or DNA 
sequences. BLAST is one of a number of generally available programs for doing sequence 
alignment. The NCBI provides a popular web-based implementation that searches their 
databases. 

Web services in bioinformatics 

SOAP and REST-based interfaces have been developed for a wide variety of bioinformatics 
applications allowing an application running on one computer in one part of the world to 
use algorithms, data and computing resources on servers in other parts of the world. The 
main advantages lay in the end user not having to deal with software and database 
maintenance overheads. Basic bioinformatics services are classified by the EBI into three 
categories: SSS (Sequence Search Services), MSA (Multiple Sequence Alignment) and BSA 
(Biological Sequence Analysis). The availability of these service-oriented bioinformatics 
resources demonstrate the applicability of web based bioinformatics solutions, and range 
from a collection of standalone tools with a common data format under a single, standalone 
or web-based interface, to integrative, distributed and extensible bioinformatics workflow 
management systems. 

See also 
Related topics 

Biocybernetics 
Bioinformatics companies 
Biologically-inspired computing 
Biomedical informatics 
Computational biology 
Computational biomodeling 
Computational genomics 
DNA sequencing theory 
Dot plot (bioinformatics) 
Dry lab 
Margaret Oakley Dayhoff 
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-» Metabolic network modelling 

Molecular Design software 

Morphometries 

Natural computation 

Pharmaceutical company 

Protein-protein interaction prediction 

List of nucleic acid simulation software 

List of numerical analysis software 

List of protein structure prediction software 

List of scientific journals in bioinformatics 

Related fields 

Applied mathematics 

Artificial intelligence 

Biology 

Cheminformatics 

Clinomics 

Comparative genomics 

Computational biology 

Computational epigenetics 

Computational science 

Computer science 

Cybernetics 

Ecoinformatics 

-> Genomics 

Informatics 

Information theory 

-» Mathematical biology 

Molecular modelling 

Neuroinformatics 

-» Proteomics 

Pervasive adaptation 

Scientific computing 

Statistics 

Structural biology 

-» Systems biology 

Theoretical biology 

Veterinary informatics 
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License 

Version 1.2, November 2002 

Copyright (C) 2000,2001,2002 Free Software Foundation, Inc. 
51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA 
Everyone is permitted to copy and distribute verbatim copies 
of this license document, but changing it is not allowed. 

0. PREAMBLE 

The purpose of this License is to make a manual, textbook, or other functional and useful document "free" in the sense of freedom: to assure everyone 
the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License 
preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others. 
This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the 
GNU General Public License, which is a copyleft license designed for free software. 

We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should 
come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any 
textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose 
is instruction or reference. 

INAPPLICABILITY AND DEFINITIONS 

This License applies to any manual or other work, in any medium, that contains a notice placed by the copyright holder saying it can be distributed under 

the terms of this License. Such a notice grants a world-wide, royalty-free license, unlimited in duration, to use that work under the conditions stated 

herein. The "Document", below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as "you". You accept the 

license if you copy, modify or distribute the work in a way reguiring permission under copyright law. 

A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or 

translated into another language. 

A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or 

authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. 

(Thus, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter 

of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them. 

The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the 

Document is released under this License. If a section does not fit the above definition of Secondary then it is not allowed to be designated as Invariant. 

The Document may contain zero Invariant Sections. If the Document does not identify any Invariant Sections then there are none. 

The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document 

is released under this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may be at most 25 words. 

A "Transparent" copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, 

that is suitable for revising the document straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for 

drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats 

suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup, or absence of markup, has been arranged to 

thwart or discourage subseguent modification by readers is not Transparent. An image format is not Transparent if used for any substantial amount of 

text. A copy that is not "Transparent" is called "Opaque". 

Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using 

a publicly available DTD, and standard-conforming simple HTML, PostScript or PDF designed for human modification. Examples of transparent image 

formats include PNG, XCF and JPG. Opaque formats include proprietary formats that can be read and edited only by proprietary word processors, SGML 

or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML, PostScript or PDF produced by some 

word processors for output purposes only. 

The "Title Page" means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License 

requires to appear in the title page. For works in formats which do not have any title page as such, "Title Page" means the text near the most prominent 

appearance of the work's title, preceding the beginning of the body of the text. 

A section "Entitled XYZ" means a named subunit of the Document whose title either is precisely XYZ or contains XYZ in parentheses following text that 

translates XYZ in another language. (Here XYZ stands for a specific section name mentioned below, such as "Acknowledgements", "Dedications", 

"Endorsements", or "History".) To "Preserve the Title" of such a section when you modify the Document means that it remains a section "Entitled XYZ" 

according to this definition. 

The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document. These Warranty Disclaimers 

are considered to be included by reference in this License, but only as regards disclaiming warranties: any other implication that these Warranty 

Disclaimers may have is void and has no effect on the meaning of this License. 

2. VERBATIM COPYING 

You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, 
and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to 
those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. 
However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in 
section 3. 
You may also lend copies, under the same conditions stated above, and you may publicly display copies. 

3. COPYING IN QUANTITY 

If you publish printed copies (or copies in media that commonly have printed covers) of the Document, numbering more than 100, and the Document's 
license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the 
front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front 
cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying 
with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in 
other respects. 

If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, 
and continue the rest onto adjacent pages. 

If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy 
along with each Opaque copy, or state in or with each Opaque copy a computer-network location from which the general network-using public has 
access to download using public-standard network protocols a complete Transparent copy of the Document, free of added material. If you use the latter 
option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will 
remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or 
retailers) of that edition to the public. 

It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a 
chance to provide you with an updated version of the Document. 

4.MODIFICATIONS 

You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified 
Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the 
Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version: 

1. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous versions (which should, if there 
were any, be listed in the History section of the Document). You may use the same title as a previous version if the original publisher of that version 
gives permission. 

2. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified Version, together 
with at least five of the principal authors of the Document (all of its principal authors, if it has fewer than five), unless they release you from this 
requirement. 

3. State on the Title page the name of the publisher of the Modified Version, as the publisher. 

4. Preserve all the copyright notices of the Document. 

5. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices. 
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6. Include, immediately after the copyright notices, a license notice giving the public permission to use the Modified Version under the terms of this 
License, in the form shown in the Addendum below. 

7. Preserve in that license notice the full lists of Invariant Sections and reguired Cover Texts given in the Document's license notice. 

8. Include an unaltered copy of this License. 

9. Preserve the section Entitled "History", Preserve its Title, and add to it an item stating at least the title, year, new authors, and publisher of the 
Modified Version as given on the Title Page. If there is no section Entitled "History" in the Document, create one stating the title, year, authors, and 
publisher of the Document as given on its Title Page, then add an item describing the Modified Version as stated in the previous sentence. 

10. Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document, and likewise the network 
locations given in the Document for previous versions it was based on. These may be placed in the "History" section. You may omit a network 
location for a work that was published at least four years before the Document itself, or if the original publisher of the version it refers to gives 
permission. 

11. For any section Entitled "Acknowledgements" or "Dedications", Preserve the Title of the section, and preserve in the section all the substance and 
tone of each of the contributor acknowledgements and/or dedications given therein. 

12. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the eguivalent are not considered 
part of the section titles. 

13. Delete any section Entitled "Endorsements". Such a section may not be included in the Modified Version. 

14. Do not retitle any existing section to be Entitled "Endorsements" or to conflict in title with any Invariant Section. 

15. Preserve any Warranty Disclaimers. 

If the Modified Version includes new front-matter sections or appendices that gualify as Secondary Sections and contain no material copied from the 

Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the 

Modified Version's license notice. These titles must be distinct from any other section titles. 

You may add a section Entitled "Endorsements", provided it contains nothing but endorsements of your Modified Version by various parties-for example, 

statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard. 

You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover 

Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) 

any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity 

you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the 

old one. 

The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply 

endorsement of any Modified Version. 

5. COMBINING DOCUMENTS 

You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, 

provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant 

Sections of your combined work in its license notice, and that you preserve all their Warranty Disclaimers. 

The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are 

multiple Invariant Sections with the same name but different contents, make the title of each such section unigue by adding at the end of it, in 

parentheses, the name of the original author or publisher of that section if known, or else a unigue number. Make the same adjustment to the section 

titles in the list of Invariant Sections in the license notice of the combined work. 

In the combination, you must combine any sections Entitled "History" in the various original documents, forming one section Entitled "History"; likewise 

combine any sections Entitled "Acknowledgements", and any sections Entitled "Dedications". You must delete all sections Entitled "Endorsements." 

6. COLLECTIONS OF DOCUMENTS 

You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this 
License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim 
copying of each of the documents in all other respects. 

You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into 
the extracted document, and follow this License in all other respects regarding verbatim copying of that document. 

7.AGGREGATION WITH INDEPENDENT WORKS 

A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution 
medium, is called an "aggregate" if the copyright resulting from the compilation is not used to limit the legal rights of the compilation's users beyond 
what the individual works permit. When the Document is included in an aggregate, this License does not apply to the other works in the aggregate which 
are not themselves derivative works of the Document. 

If the Cover Text reguirement of section 3 is applicable to these copies of the Document, then if the Document is less than one half of the entire 
aggregate, the Document's Cover Texts may be placed on covers that bracket the Document within the aggregate, or the electronic eguivalent of covers 
if the Document is in electronic form. Otherwise they must appear on printed covers that bracket the whole aggregate. 

8.TRANSLATION 

Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant 
Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in 
addition to the original versions of these Invariant Sections. You may include a translation of this License, and all the license notices in the Document, 
and any Warranty Disclaimers, provided that you also include the original English version of this License and the original versions of those notices and 
disclaimers. In case of a disagreement between the translation and the original version of this License or a notice or disclaimer, the original version will 
prevail. 

If a section in the Document is Entitled "Acknowledgements", "Dedications", or "History", the reguirement (section 4) to Preserve its Title (section 1) will 
typically require changing the actual title. 

9.TERMINATION 

You may not copy, modify, sublicense, or distribute the Document except as expressly provided for under this License. Any other attempt to copy, modify, 
sublicense or distribute the Document is void, and will automatically terminate your rights under this License. However, parties who have received 
copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 

10. FUTURE REVISIONS OF THIS LICENSE 

The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be 
similar in spirit to the present version, but may differ in detail to address new problems or concerns. See http://www.gnu.org/copyleft/. 
Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License "or 
any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has 
been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any 
version ever published (not as a draft) by the Free Software Foundation. 

How to use this License for your documents 

To use this License in a document you have written, include a copy of the License in the document and put the following copyright and license notices 
just after the title page: 

Copyright (c) YEAR YOUR NAME. 

Permission is granted to copy, distribute and/or modify this document 

under the terms of the GNU Free Documentation License, Version 1.2 

or any later version published by the Free Software Foundation; 

with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. 

A copy of the license is included in the section entitled "GNU 

Free Documentation License". 
If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the "with. ..Texts." line with this: 

with the Invariant Sections being LIST THEIR TITLES, with the 

Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST. 
If you have Invariant Sections without Cover Texts, or some other combination of the three, merge those two alternatives to suit the situation. 
If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software 
license, such as the GNU General Public License, to permit their use in free software. 



